chapter five

5 Training Large Language Models: How to generate the generator

This chapter covers

Setting up a training environment and common libraries
Applying various training techniques including leveraging advanced methodologies
Tips and tricks to get the most out of training

Are you ready to have some fun?! What do you mean the last four chapters weren’t fun? Well, I promise this one for sure will be. We’ve leveled up a lot and gained a ton of context that will prove invaluable now as we start to get our hands dirty. By training an LLM, we can create bots that can do amazing things and have unique personalities. Indeed, we can create new friends and play with them. In fact, in the last chapter we showed you how to create a training dataset based on your slack messages and will show you how to take that dataset and create a persona of yourself. Finally, you will no longer have to talk to that one annoying coworker[1].

First things first, we’ll show you how to set up a training environment as the process can be very resource-demanding, and without the proper equipment you won’t be able to really enjoy what comes next. We’ll then show you how to do the basics like training from scratch and finetuning, after which we’ll get into some of the best-known methods to improve upon these processes making them more efficient, faster, and cheaper. Lastly, we’ll end the chapter with some tips and tricks we’ve acquired through our experience of training models in the field.

5.1 Multi-GPU Environments

5.1.1 Setting up

5.1.2 Libraries

5.2 Basic Training Techniques

5.2.1 From Scratch

5.2.2 Transfer Learning (Finetuning)

5.2.3 Prompting

5.3 Advanced Training Techniques

5.3.1 Prompt Tuning

5.3.2 Finetuning with knowledge distillation

5.3.3 Reinforcement Learning with Human Feedback

5.3.4 Mixture of Experts

5.3.5 PEFT & LoRA

5.4 Training Tips and Tricks

5.4.1 Training data size notes

5.4.2 Efficient Training

5.4.3 Local minima traps

5.4.4 Hyperparameter tuning tips

5.4.5 A note on operating systems

5.4.6 Activation function advice

5.5 Summary