chapter fifteen
15 Regularization
This chapter covers
- How a KL divergence constrains the RLHF process
- Why regularization prevents models from producing nonsensical outputs
- Different regularization techniques including pretraining gradients
Throughout the RLHF optimization, many regularization steps are used to prevent over-optimization of the reward model. Over-optimization in these contexts looks like models that output nonsensical text. Some examples of optimization “off the rails” are that models can output followable math reasoning with extremely incorrect answers, repeated text, switching languages, or excessive special characters. This chapter covers the different methods that’re used to control the optimization of models.