UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing Paper • 2507.23278 • Published Jul 31 • 1
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction Paper • 2502.17239 • Published Feb 24 • 3
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies Paper • 2503.14324 • Published Mar 18 • 2
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning Paper • 2503.19470 • Published Mar 25 • 19
Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer Paper • 2304.11818 • Published Apr 24, 2023
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 40
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 40
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published Mar 3 • 8
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published Mar 3 • 8
S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners Paper • 2409.01524 • Published Sep 3, 2024 • 1
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning Paper • 2409.12929 • Published Sep 19, 2024 • 2
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning Paper • 2410.12952 • Published Oct 16, 2024