Few Questions
#2
by
yukiarimo
- opened
- Does it supports video?
- Is THIS model better than Qwen 3 VL 4B or I should still to Qwen instead (if I better all-for-all model that I do full SFT on)?
- Is ViT custom or SigLip?
Does it supports video?
No
Is THIS model better than Qwen 3 VL 4B or I should still to Qwen instead (if I better all-for-all model that I do full SFT on)?
You should probably try out both models for your use case
Is ViT custom or SigLip?
The image encoder is a custom trained image encoder based on Pixtral
Thanks. But I can still load video as multiple 1 FPS images like in Qwen, right?
yukiarimo
changed discussion status to
closed