Few Questions

#2
by yukiarimo - opened
  1. Does it supports video?
  2. Is THIS model better than Qwen 3 VL 4B or I should still to Qwen instead (if I better all-for-all model that I do full SFT on)?
  3. Is ViT custom or SigLip?

Does it supports video?
No

Is THIS model better than Qwen 3 VL 4B or I should still to Qwen instead (if I better all-for-all model that I do full SFT on)?
You should probably try out both models for your use case

Is ViT custom or SigLip?
The image encoder is a custom trained image encoder based on Pixtral

Thanks. But I can still load video as multiple 1 FPS images like in Qwen, right?

yukiarimo changed discussion status to closed

Sign up or log in to comment