-
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Paper • 2509.16197 • Published • 56 -
InternRobotics/VLAC
Robotics • 2B • Updated • 43 • 37 -
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence
Paper • 2509.12203 • Published • 19 -
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
Paper • 2509.15937 • Published • 20
fysp
fysp
·
AI & ML interests
tech, ai, climate, social, disrupt
Organizations
None yet