MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models Paper • 2406.07594 • Published Jun 11, 2024 • 1
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models Paper • 2406.14952 • Published Jun 21, 2024
A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models Paper • 2508.12903 • Published Aug 18 • 11
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking Paper • 2509.23392 • Published Sep 27
Reflection-Bench: probing AI intelligence with reflection Paper • 2410.16270 • Published Oct 21, 2024 • 6