RLHFlow

You can find open-source code, tutorials, and projects related to Reinforcement Learning from Human Feedback (RLHF) here! Code Repositories: https://github.com/RLHFlow/ Models and Datasets: https://huggingface.co/RLHFlow/

Latest Posts

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

An interpretable reward modeling approach.

May 29, 2024 · 15 min · Haoxiang Wang

Alignment Guidebook

A guidebook for LLM alignment.

March 26, 2024 · 49 min · Shangmin Guo, Wei Xiong

Reward Modeling Part 1: Bradley-Terry Model

This is the recipe for the RLHFlow/RLHF-Reward-Modeling repository used to train the reward model for RLHF.

March 23, 2024 · 12 min · Wei Xiong, Hanze Dong, Rui Yang