chapter fifteen

15 Regularization

 

This chapter covers

  • How a KL divergence constrains the RLHF process
  • Why regularization prevents models from producing nonsensical outputs
  • Different regularization techniques including pretraining gradients

Throughout the RLHF optimization, many regularization steps are used to prevent over-optimization of the reward model. Over-optimization in these contexts looks like models that output nonsensical text. Some examples of optimization “off the rails” are that models can output followable math reasoning with extremely incorrect answers, repeated text, switching languages, or excessive special characters. This chapter covers the different methods that’re used to control the optimization of models.

15.1 KL Divergences in RL Optimization

15.1.1 Reference Model to Generations

15.1.2 Implementation Example

15.2 Pretraining Gradients

15.3 Other Regularization