-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 259 -
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 133 -
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Paper • 2507.22827 • Published • 99 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208
Collections
Discover the best community collections!
Collections including paper arxiv:2508.18265
-
Motif-Technologies/Motif-2.6B
Text Generation • 3B • Updated • 176 • 78 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 1.31k • 132 -
openbmb/MiniCPM-V-4_5
Image-Text-to-Text • 9B • Updated • 50.8k • 1.02k
-
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper • 2508.11987 • Published • 71 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 66 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 8.19k • 1.23k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 139 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 1.31k • 132 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 5.93k • 37 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 43.7k • 37
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 1.31k • 132 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 5.93k • 37 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 43.7k • 37
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
-
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper • 2507.06448 • Published • 47 -
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
Paper • 2507.05920 • Published • 11 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 259 -
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 133 -
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Paper • 2507.22827 • Published • 99 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 1.31k • 132 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 5.93k • 37 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 43.7k • 37
-
Motif-Technologies/Motif-2.6B
Text Generation • 3B • Updated • 176 • 78 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 1.31k • 132 -
openbmb/MiniCPM-V-4_5
Image-Text-to-Text • 9B • Updated • 50.8k • 1.02k
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 1.31k • 132 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 5.93k • 37 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 43.7k • 37
-
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper • 2508.11987 • Published • 71 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 497
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 66 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110
-
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper • 2507.06448 • Published • 47 -
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
Paper • 2507.05920 • Published • 11 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 8.19k • 1.23k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 139 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13