chapter seventeen

17 Evaluation

 

Evaluation is an ever evolving approach. The key to understanding language model evaluation, particularly with post-training, is that the current popular evaluation regimes represent a reflection of the popular training best practices and goals. While challenging evaluations drive progress in language models to new areas, the majority of evaluation is designed around building useful signals for new models.

In many ways, this chapter is designed to present vignettes of popular evaluation regimes throughout the early history of RLHF, so readers can understand the common themes, details, and failure modes.

Evaluation for RLHF and post-training has gone a few distinct phases in its early history:

17.1 Prompting Formatting: From Few-shot to Zero-shot to CoT

17.2 Using Evaluations vs. Observing Evaluations

17.3 Contamination

17.4 Tooling