RLHFlow
You can find open-source code, tutorials, and projects related to Reinforcement Learning from Human Feedback (RLHF) here! Code Repositories: https://github.com/RLHFlow/ Models and Datasets: https://huggingface.co/RLHFlow/
An interpretable reward modeling approach.
A guidebook for LLM alignment.
This is the recipe for the RLHFlow/RLHF-Reward-Modeling repository used to train the reward model for RLHF.