Scaling Agent Learning via Experience Synthesis Paper • 2511.03773 • Published about 1 month ago • 80
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30 • 115 • 5
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30 • 115 • 5
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28 • 15
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28 • 15 • 1
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9 • 41
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published Sep 29 • 18
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8 • 16
The Majority is not always right: RL training for solution aggregation Paper • 2509.06870 • Published Sep 8 • 16 • 2
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2 • 24
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2 • 24
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning Paper • 2505.10320 • Published May 15 • 24
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published Mar 19 • 13
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published Jan 18 • 15
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 108
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 90