chapter nine

9 Rejection Sampling

 

This chapter covers

  • What rejection sampling is
  • Why rejection sampling may be used instead of RL

Rejection Sampling (RS) is one of the most widely used yet least documented methods in preference fine-tuning. Many prominent RLHF papers use it as a core component of their training pipeline, yet no canonical implementation or explanation of why it works so well exists. RS can be applied at multiple points in the training pipeline – after instruction fine-tuning, after RL-based optimization, or even after RLVR – making it a versatile but hard-to-place tool. Combined with its underdocumented nature, this is why it appears here at the end of the core optimization methods. Rejection sampling operates by curating new candidate completions, filtering them based on a trained reward model, and then fine-tuning the original model only on the top completions (the same loss function as instruction tuning).

The name originates from computational statistics [1], where one wishes to sample from a complex distribution, but does not have a direct method to do so. To alleviate this, one samples from a simpler distribution to model and uses a heuristic to check if the sample is permissible. With language models, the target distribution is high-quality completions to prompts, the filter is a reward model, and the sampling distribution is the current model.

9.1 Training Process Step By Step

9.1.1 Generating Completions

9.1.2 Scoring Completions

9.1.3 Fine-tuning

9.2 Implementation Details

Summary