RLHFlow/pair-preference-model-LLaMA3-8B
Text Generation
•
8B
•
Updated
•
512
•
•
38
Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/