chapter two

2 Training large language models: Learning at scale

 

This chapter covers

  • How LLMs and multimodal models are trained
  • Exploring efficient architectures, such as Mixture of Experts and sparse models
  • Improving performance through post-training and inference-time techniques
  • Emergent properties of LLMs

For decades, the digital economy has run on the currency of data. The digital economy of collecting and trading information about who we are and what we do online is worth trillions of dollars. As more of our daily activities have moved onto the internet, the mill has ever more grist to grind through. Large language models (LLMs) are inventions of the internet age, emulating human language by vacuuming up terabytes of text, image, and video data found online.

This scale has demanded new approaches to make models not just larger, but more specialized and adaptable. Researchers have developed innovative techniques to make these models more capable and efficient, including multimodal training that allows models to simultaneously process images and text. However, simply scaling up models isn’t always practical, and this has driven innovations like knowledge transfer techniques to create more efficient models and Mixture of Experts (MoE) architectures that allow for larger, more specialized models without proportional increases in computational costs. Other strategies, such as test-time scaling and post-training techniques, help refine model behavior after pretraining is complete.

How are LLMs trained?

Exploring open web data collection

Demystifying autoregression and bidirectional token prediction

Training multimodal LLMs

Transferring knowledge for efficient models

Mixture of Experts and sparse models

Reasoning models

Techniques for post-training LLMs

Supervised fine-tuning

Reinforcement learning from human feedback

Direct preference optimization

Reinforcement learning from AI feedback

Emergent properties of LLMs

Learning with a few examples

Is emergence an illusion?

Conclusion

Summary