chapter eleven

11 Preference Data

 

This chapter covers

  • The core data engine behind RLHF
  • Why collecting human data can be extremely challenging
  • Decisions to be made when collecting preference data
  • Open questions in preference data

Preference data is the engine of preference fine-tuning and reinforcement learning from human feedback. The core problem we’ve been trying to solve with RLHF is that we cannot precisely model human rewards and preferences for AI models’ outputs – as in write clearly defined loss functions to optimize against – so preference data is the proxy signal we use to tune our models. The data is what allows us to match behaviors we desire and avoid some failure modes we hate. The data is so rich a source that it is difficult to replace this style of optimization at all. Within preference fine-tuning, many methods for collecting and using said data have been proposed, and given that human preferences cannot be captured in a clear reward function, many more will come to enable this process of collecting labeled preference data at the center of RLHF and related techniques. Today, two main challenges exist around preference data that are intertwined with this chapter: 1) operational complexity and cost of collection, and 2) the need for preference data to be collected on the generations from the model being trained (called “on-policy”);

In this chapter, we detail technical decisions on how the data is formatted and organizational practices for collecting it.

11.1 Why We Need Preference Data

11.2 Collecting Preference Data

11.2.1 Interface

11.2.2 Rankings vs. Ratings

11.2.3 Multi-turn Data

11.2.4 Structured Preference Data

11.2.5 Sourcing and Contracts

11.3 Bias: Things to Watch Out For in Data Collection

11.4 Open Questions in RLHF Preference Data