Update README.md
Browse files
README.md
CHANGED
|
@@ -4,14 +4,14 @@ license: apache-2.0
|
|
| 4 |
|
| 5 |
<div align='center'>
|
| 6 |
<h1>EVEv2: Improved Baselines for Encoder-Free Vision-Language Models</h1h1>
|
| 7 |
-
<h3><a href="https://
|
| 8 |
|
| 9 |
[Haiwen Diao*](https://scholar.google.com/citations?user=46eCjHQAAAAJ&hl=zh-CN), [Xiaotong Li*](https://scholar.google.com/citations?hl=zh-CN&user=cpCE_T4AAAAJ), [Yufeng Cui*](https://scholar.google.com/citations?user=5Ydha2EAAAAJ&hl=zh-CN&oi=ao), [Yueze Wang*](https://scholar.google.com/citations?user=ga2MKaMAAAAJ&hl=zh-CN), [Haoge Deng](https://scholar.google.com/citations?user=S2sbvjgAAAAJ&hl=zh-CN), [Ting Pan](https://scholar.google.com/citations?user=qQv6YbsAAAAJ&hl=zh-CN), [Wenxuan Wang](https://scholar.google.com/citations?hl=zh-CN&user=75OyC-oAAAAJ), [Huchuan Lu📧](https://scholar.google.com/citations?user=D3nE0agAAAAJ&hl=zh-CN), [Xinlong Wang📧](https://scholar.google.com/citations?user=DPz0DjYAAAAJ&hl=zh-CN)
|
| 10 |
|
| 11 |
Dalian University of Technology; Beijing Academy of Artificial Intelligence; Peking University;
|
| 12 |
Beijing University of Posts and Telecommunications; University of Chinese Academy of Sciences; Chinese Academy of Sciences Institute of Automation
|
| 13 |
|
| 14 |
-
| [Paper](https://
|
| 15 |
</div>
|
| 16 |
|
| 17 |
Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficient deployment.
|
|
@@ -36,5 +36,10 @@ We release the instruction-tuned weights of **EVEv2**.
|
|
| 36 |
## ✒️ Citation
|
| 37 |
If **EVE** is helpful for your research, please consider **star** ⭐ and **citation** 📝 :
|
| 38 |
```bibtex
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
```
|
|
|
|
| 4 |
|
| 5 |
<div align='center'>
|
| 6 |
<h1>EVEv2: Improved Baselines for Encoder-Free Vision-Language Models</h1h1>
|
| 7 |
+
<h3><a href="https://arxiv.org/abs/2502.06788">EVEv2: Improved Baselines for Encoder-Free Vision-Language Models</a></h3>
|
| 8 |
|
| 9 |
[Haiwen Diao*](https://scholar.google.com/citations?user=46eCjHQAAAAJ&hl=zh-CN), [Xiaotong Li*](https://scholar.google.com/citations?hl=zh-CN&user=cpCE_T4AAAAJ), [Yufeng Cui*](https://scholar.google.com/citations?user=5Ydha2EAAAAJ&hl=zh-CN&oi=ao), [Yueze Wang*](https://scholar.google.com/citations?user=ga2MKaMAAAAJ&hl=zh-CN), [Haoge Deng](https://scholar.google.com/citations?user=S2sbvjgAAAAJ&hl=zh-CN), [Ting Pan](https://scholar.google.com/citations?user=qQv6YbsAAAAJ&hl=zh-CN), [Wenxuan Wang](https://scholar.google.com/citations?hl=zh-CN&user=75OyC-oAAAAJ), [Huchuan Lu📧](https://scholar.google.com/citations?user=D3nE0agAAAAJ&hl=zh-CN), [Xinlong Wang📧](https://scholar.google.com/citations?user=DPz0DjYAAAAJ&hl=zh-CN)
|
| 10 |
|
| 11 |
Dalian University of Technology; Beijing Academy of Artificial Intelligence; Peking University;
|
| 12 |
Beijing University of Posts and Telecommunications; University of Chinese Academy of Sciences; Chinese Academy of Sciences Institute of Automation
|
| 13 |
|
| 14 |
+
| [Paper](https://arxiv.org/abs/2502.06788) | [Code](https://github.com/baaivision/EVE) |
|
| 15 |
</div>
|
| 16 |
|
| 17 |
Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficient deployment.
|
|
|
|
| 36 |
## ✒️ Citation
|
| 37 |
If **EVE** is helpful for your research, please consider **star** ⭐ and **citation** 📝 :
|
| 38 |
```bibtex
|
| 39 |
+
@article{diao2025EVEv2,
|
| 40 |
+
title={EVEv2: Improved Baselines for Encoder-Free Vision-Language Models},
|
| 41 |
+
author={Diao, Haiwen and Li, Xiaotong and Cui, Yufeng and Wang, Yueze and Deng, Haoge and Pan, Ting and Wang, Wenxuan and Lu, Huchuan and Wang, Xinlong},
|
| 42 |
+
journal={arXiv preprint arXiv:2502.06788},
|
| 43 |
+
year={2025}
|
| 44 |
+
}
|
| 45 |
```
|