-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 49 -
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Paper • 2504.20571 • Published • 98 -
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper • 2506.18254 • Published • 31
Igor Kilbas
kaleinaNyan
AI & ML interests
Computer Vision, NLP
Recent Activity
liked
a dataset
about 18 hours ago
t-tech/ruAIME-2025
liked
a dataset
about 18 hours ago
t-tech/ruMMLU-pro
liked
a dataset
about 18 hours ago
t-tech/ruMATH-500
Organizations
None yet