nkkbr
/

ViCA2-thinkng

Video-Text-to-Text

text-generation

vision-language

video understanding

visuospatial cognition

spatial reasoning

Model card Files Files and versions

ViCA2-thinkng / README.md

nkkbr's picture

Create README.md

8089c54 verified 7 months ago

|

history blame contribute delete

791 Bytes

	---
	license: apache-2.0
	tags:
	- multimodal
	- vision-language
	- video understanding
	- visuospatial cognition
	- spatial reasoning
	- vlm
	- llava
	- qwen
	- siglip
	- hiera
	- sam2
	- dual-encoder
	datasets:
	- nkkbr/ViCA-thinking-2.68k
	language:
	- en
	library_name: transformers
	pipeline_tag: video-text-to-text
	model_name: ViCA2-7B-Thinking

	---

	## Usage and Full Documentation

	For detailed model description, training setup, datasets, evaluation results, and inference code, please refer to the following links:

	[![GitHub](https://img.shields.io/badge/GitHub-ViCA2-181717?logo=github&logoColor=white)](https://github.com/nkkbr/ViCA)

	[![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ViCA2-blue)](https://huggingface.co/nkkbr/ViCA2)