chapter fifteen

15 Regularization

This chapter covers

How a KL divergence constrains the RLHF process
Why regularization prevents models from producing nonsensical outputs
Different regularization techniques including pretraining gradients

Throughout the RLHF optimization, many regularization steps are used to prevent over-optimization of the reward model. Over-optimization in these contexts looks like models that output nonsensical text. Some examples of optimization “off the rails” are that models can output followable math reasoning with extremely incorrect answers, repeated text, switching languages, or excessive special characters. This chapter covers the different methods that’re used to control the optimization of models.

15.1 KL Divergences in RL Optimization

15 Regularization

This chapter covers

15.1 KL Divergences in RL Optimization

15.1.1 Reference Model to Generations

15.1.2 Implementation Example

15.2 Pretraining Gradients

15.3 Other Regularization

Summary