PIPer: On-Device Environment Setup via Online Reinforcement Learning Paper • 2509.25455 • Published Sep 29 • 37
🦫 PIPer Collection All the resources for our paper "PIPer: On-Device Environment Setup via Online Reinforcement Learning"! • 9 items • Updated Oct 1 • 3
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling Paper • 2510.24645 • Published Oct 28 • 7
Spurious Rewards: Rethinking Training Signals in RLVR Paper • 2506.10947 • Published Jun 12 • 2
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning Paper • 2510.27044 • Published Oct 30 • 5
Data-Efficient RLVR via Off-Policy Influence Guidance Paper • 2510.26491 • Published Oct 30 • 9
The Path Not Taken: RLVR Provably Learns Off the Principals Paper • 2511.08567 • Published 25 days ago • 31
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published 27 days ago • 50