RLHFlow

You can find open-source code, tutorials, and projects related to Reinforcement Learning from Human Feedback (RLHF) here! Code Repositories: https://github.com/RLHFlow/ Models and Datasets: https://huggingface.co/RLHFlow/

Latest Posts

A decision-tree perspective to interpret LLM preference mechanisms.

An interpretable reward modeling approach.

A guidebook for LLM alignment.

This is the recipe for the RLHFlow/RLHF-Reward-Modeling repository used to train the reward model for RLHF.