12 Synthetic Data
This chapter covers
- What is synthetic data
- Why synthetic data actually works
- How models are successfully distilled from other models
- Constitutional AI and other methods for AI feedback
- Learning from rubrics and future directions of synthetic data
Reinforcement learning from human feedback is deeply rooted in the idea of keeping a human influence on the models we are building. When the first models were trained successfully with RLHF, human data was the only viable way to improve the models in this way.
Humans were the only way to create high enough quality responses to questions for training. Humans were the only way to collect reliable and specific feedback data to train reward models.
As AI models got better, this assumption rapidly broke down. The possibility of synthetic data, which is far cheaper and easier to iterate on, enabled the proliferation from RLHF being the center of attention to the idea of a broader “post-training” shaping the models. This chapter provides a cursory overview of how and why synthetic data is replacing or expanding many pieces of the RLHF pipeline.