Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.14028

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 114
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 100
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 104

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16 • 93

about 3 hours ago

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21 • 1.15M • 2.12k • 543
nvidia/OpenMathReasoning

Viewer • Updated May 27 • 5.68M • 14.4k • 366
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5 • 17

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23 • 13

LLM + Datasets : Finance

MMMU/MMMU

Viewer • Updated Sep 19, 2024 • 11.6k • 65.3k • 303
takala/financial_phrasebank

Updated Jan 18, 2024 • 4.81k • 241
zeroshot/twitter-financial-news-sentiment

Viewer • Updated Feb 23, 2024 • 11.9k • 4.72k • 154
yixuantt/FinEntity

Viewer • Updated Jan 24, 2024 • 979 • 91 • 4

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16 • 93
One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 133
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16 • 93
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published Jul 26 • 14

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 85

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29

June 2025 - Top Papers

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 114
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Paper • 2506.09513 • Published Jun 11 • 100
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 104

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16 • 93
One Token to Fool LLM-as-a-Judge

Paper • 2507.08794 • Published Jul 11 • 31
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16 • 93

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 133
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

Paper • 2506.14028 • Published Jun 16 • 93
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published Jul 26 • 14

about 3 hours ago

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19 • 45
facebook/natural_reasoning

Viewer • Updated Feb 21 • 1.15M • 2.12k • 543
nvidia/OpenMathReasoning

Viewer • Updated May 27 • 5.68M • 14.4k • 366
Search Arena: Analyzing Search-Augmented LLMs

Paper • 2506.05334 • Published Jun 5 • 17

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10 • 43
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14 • 85

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23 • 13

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

LLM + Datasets : Finance

MMMU/MMMU

Viewer • Updated Sep 19, 2024 • 11.6k • 65.3k • 303
takala/financial_phrasebank

Updated Jan 18, 2024 • 4.81k • 241
zeroshot/twitter-financial-news-sentiment

Viewer • Updated Feb 23, 2024 • 11.9k • 4.72k • 154
yixuantt/FinEntity

Viewer • Updated Jan 24, 2024 • 979 • 91 • 4

Papers-Benchmarks

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Paper • 2406.08587 • Published Jun 12, 2024 • 16
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Paper • 2406.09170 • Published Jun 13, 2024 • 27
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26, 2024 • 35
Benchmarking Agentic Workflow Generation

Paper • 2410.07869 • Published Oct 10, 2024 • 29

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs