5 Pretraining on unlabeled data

This chapter covers

Computing the training and validation set losses to assess the quality of LLM-generated text during training
Implementing a training function and pretraining the LLM
Saving and loading model weights to continue training an LLM
Loading pretrained weights from OpenAI

Thus far, we have implemented the data sampling and attention mechanism and coded the LLM architecture. It is now time to implement a training function and pretrain the LLM. We will learn about basic model evaluation techniques to measure the quality of the generated text, which is a requirement for optimizing the LLM during the training process. Moreover, we will discuss how to load pretrained weights, giving our LLM a solid starting point for fine-tuning. Figure 5.1 lays out our overall plan, highlighting what we will discuss in this chapter.

Figure 5.1 The three main stages of coding an LLM. This chapter focuses on stage 2: pretraining the LLM (step 4), which includes implementing the training code (step 5), evaluating the performance (step 6), and saving and loading model weights (step 7).

5.1 Evaluating generative text models

5.1.1 Using GPT to generate text

5.1.2 Calculating the text generation loss

5.1.3 Calculating the training and validation set losses

5.2 Training an LLM

5.3 Decoding strategies to control randomness

5.3.1 Temperature scaling

5.3.2 Top-k sampling

5.3.3 Modifying the text generation function

5.4 Loading and saving model weights in PyTorch

5.5 Loading pretrained weights from OpenAI

Summary