chapter six

6 Preference data

Preference data is the engine of preference finetuning and reinforcement learning from human feedback. The data is the signal groups collect in order to then match behaviors they desire and avoid the others. Within preference finetuning, many methods for collecting and using said data have been proposed, but until human preferences can be captured in a clear reward function, this process of collecting labeled preference data will be central to RLHF and related techniques.

6.1 Why We Need Preference Data

The preference data is needed for RLHF because directly capturing complex human values in a single reward function is effectively impossible. Collecting this data to train reward models is one of the original ideas behind RLHF [1] and has continued to be used extensively throughout the emergence of modern language models. One of the core intuitions for why this data works so well is that it is far easier, both for humans and AI models supervising data collection, to differentiate between a good and a bad answer for a prompt than it is to generate a good answer on its own. This chapter focuses on the mechanics of getting preference data and the best-practices depend on the specific problem being solved.

6 Preference data

6.1 Why We Need Preference Data

6.2 Bias

6.3 Collecting Preference Data

6.3.1 Interface

6.3.2 Rankings vs. Ratings

6.3.3 Multi-turn Data

6.3.4 Structured Preference Data

6.3.5 Sourcing and Contracts

6.4 Are the Preferences Expressed in the Models?