chapter ten

10 Rejection sampling

Rejection Sampling (RS) is a popular and simple baseline for performing preference fine-tuning. Rejection sampling operates by curating new candidate completions, filtering them based on a trained reward model, and then fine-tuning the original model only on the top completions.

The name originates from computational statistics [1], where one wishes to sample from a complex distribution, but does not have a direct method to do so. To alleviate this, one samples from a simpler to model distribution and uses a heuristic to check if the sample is permissible. With language models, the target distribution is high-quality completions to prompts, the filter is a reward model, and the sampling distribution is the current model.

Many prominent RLHF and preference fine-tuning papers have used rejection sampling as a baseline, but a canonical implementation and documentation does not exist.

WebGPT [2], Anthropic’s Helpful and Harmless agent [3], OpenAI’s popular paper on process reward models [4], Llama 2 Chat models [5], and other seminal works all use this baseline.

10.1 Training Process

A visual overview of the rejection sampling process is included below in Figure 10.1.

Figure 10.1 Rejection sampling overview.

10.1.1 Generating Completions

Let’s define a set of \(M\) prompts as a vector:

\[X = [x_1, x_2, ..., x_M]\]

These prompts can come from many sources, but most popularly they come from the instruction training set.

10 Rejection sampling

10.1 Training Process

Figure 10.1 Rejection sampling overview.

10.1.1 Generating Completions

10.1.2 Selecting Top-N Completions

10.1.3 Fine-tuning

10.1.4 Details

10 Rejection sampling

10.1 Training Process

Figure 10.1 Rejection sampling overview.

10.1.1 Generating Completions

10.1.2 Selecting Top-N Completions

10.1.3 Fine-tuning

10.1.4 Details

10.2 Related: Best-of-N Sampling