--- license: apache-2.0 tags: - multimodal - vision-language - video understanding - visuospatial cognition - spatial reasoning - vlm - llava - qwen - siglip - hiera - sam2 - dual-encoder datasets: - nkkbr/ViCA-thinking-2.68k language: - en library_name: transformers pipeline_tag: video-text-to-text model_name: ViCA2-7B-Thinking --- ## Usage and Full Documentation For detailed model description, training setup, datasets, evaluation results, and inference code, **please refer to the following links**: [![GitHub](https://img.shields.io/badge/GitHub-ViCA2-181717?logo=github&logoColor=white)](https://github.com/nkkbr/ViCA) [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ViCA2-blue)](https://huggingface.co/nkkbr/ViCA2)