RLHFlow

university

RLHFlow

RLHFlow

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

baohao updated a collection about 2 months ago

baohao updated a collection about 2 months ago

baohao updated a model about 2 months ago

RLHFlow/Qwen2.5-Math-1.5B-DAPO-easy

View all activity

Papers

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

View all Papers

RLHFlow 's models 37

RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • 8B • Updated Oct 14, 2024 • 512 • • 38

RLHFlow/LLaMA3-iterative-DPO-final

Text Generation • 8B • Updated Oct 14, 2024 • 141 • • 41

RLHFlow/LLaMA3.2-3B-SFT

Text Generation • 3B • Updated Oct 1, 2024 • 168

RLHFlow/LLaMA3.2-1B-SFT

Text Generation • 1B • Updated Oct 1, 2024 • 32 •

RLHFlow/ArmoRM-Llama3-8B-v0.1

Text Classification • 8B • Updated Sep 23, 2024 • 12.4k • 183

RLHFlow/DPA-v1-Mistral-7B

Text Generation • 7B • Updated May 23, 2024 • 140 • 1

RLHFlow/RewardModel-Mistral-7B-for-DPA-v1

Text Classification • 7B • Updated May 23, 2024 • 282 • 4