1 36 122

Peng Wang

stillarrow

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

upvoted an article 4 days ago

Illustrating Reinforcement Learning from Human Feedback (RLHF)

liked a dataset 4 days ago

zwhe99/DeepMath-103K

liked a model 9 days ago

deepseek-ai/DeepSeek-Math-V2

View all activity

Organizations

None yet

upvoted an article 4 days ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

376

liked a dataset 4 days ago

zwhe99/DeepMath-103K

Viewer • Updated May 29 • 103k • 15.7k • 275

liked a model 9 days ago

deepseek-ai/DeepSeek-Math-V2

Text Generation • 685B • Updated 10 days ago • 8.96k • 636

upvoted a paper 18 days ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published 28 days ago • 128

liked a model 18 days ago

WeiboAI/VibeThinker-1.5B

Text Generation • 2B • Updated 13 days ago • 27.5k • 497

liked a model 23 days ago

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated 15 days ago • 5.13k • 235

liked a dataset about 1 month ago

open-r1/DAPO-Math-17k-Processed

Viewer • Updated 26 days ago • 34.8k • 5.01k • 52

upvoted 2 papers about 2 months ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2 • 80

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30 • 55

liked 3 models about 2 months ago

liked a dataset 2 months ago

jupyter-agent/jupyter-agent-dataset

Viewer • Updated Sep 10 • 95.8k • 1.68k • 151

liked a model 2 months ago

jinaai/jina-embeddings-v4

Visual Document Retrieval • 4B • Updated Sep 2 • 68.9k • 428

upvoted a collection 2 months ago

Qwen3-VL

Collection

37 items • Updated Nov 1 • 488

liked 2 models 2 months ago

Qwen/Qwen3-VL-235B-A22B-Thinking

Image-Text-to-Text • 236B • Updated 10 days ago • 6.65k • • 339

Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

upvoted a paper 2 months ago

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24 • 119

updated a dataset 2 months ago

stillarrow/MATH

Viewer • Updated Sep 25 • 26.5k • 25

published a dataset 2 months ago

stillarrow/MATH

Viewer • Updated Sep 25 • 26.5k • 25

Peng Wang

AI & ML interests

Recent Activity

Organizations

stillarrow's activity

Illustrating Reinforcement Learning from Human Feedback (RLHF)