About
Blog
Code
Models & Data
Tags

Bradley-Terry

Reward Modeling Part 1: Bradley-Terry Model

This is the recipe for the RLHFlow/RLHF-Reward-Modeling repository used to train the reward model for RLHF.

March 23, 2024 · 12 min · Wei Xiong, Hanze Dong, Rui Yang

© 2025 RLHFlow · Powered by Hugo & PaperMod