Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 6 days ago • 77
SocialEval: Evaluating Social Intelligence of Large Language Models Paper • 2506.00900 • Published Jun 1
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers Paper • 2509.03059 • Published Sep 3 • 24
SafetyBench: Evaluating the Safety of Large Language Models with Multiple Choice Questions Paper • 2309.07045 • Published Sep 13, 2023
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement Paper • 2502.16776 • Published Feb 24 • 6
SocialEval: Evaluating Social Intelligence of Large Language Models Paper • 2506.00900 • Published Jun 1
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 240
CPM: A Large-scale Generative Chinese Pre-trained Language Model Paper • 2012.00413 • Published Dec 1, 2020