ORION: Teaching Language Models to Reason Efficiently in the Language of Thought Paper • 2511.22891 • Published 11 days ago • 7
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 25 days ago • 111
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 25 days ago • 111
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published 20 days ago • 222
FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI Paper • 2511.13524 • Published 21 days ago • 6
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published 20 days ago • 74
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 25 days ago • 111 • 4
VIDEOP2R: Video Understanding from Perception to Reasoning Paper • 2511.11113 • Published 25 days ago • 111 • 4
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models Paper • 2511.11134 • Published 25 days ago • 31
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation Paper • 2511.11434 • Published 24 days ago • 44
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes Paper • 2409.04053 • Published Sep 6, 2024 • 1
DyCodeEval Collection DyCodeEval (ICML 2025) enables dynamic benchmarking for code LLMs. This collection features dynamic HumanEval and MBPP sets generated with Claude 3.5. • 3 items • Updated Jun 27 • 4
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination Paper • 2503.04149 • Published Mar 6 • 6
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking Paper • 2409.17458 • Published Sep 26, 2024 • 1