Reward Modeling Part 1: Bradley-Terry Model
This is the recipe for the RLHFlow/RLHF-Reward-Modeling repository used to train the reward model for RLHF.
This is the recipe for the RLHFlow/RLHF-Reward-Modeling repository used to train the reward model for RLHF.