7 Finetuning to Follow Instructions

 

This chapter covers

  • Introduction to the instruction finetuning process of LLMs
  • Preparing a dataset for supervised instruction finetuning
  • Organizing instruction data in training batches
  • Loading a pretrained LLM and finetuning it to follow human instructions
  • Extracting LLM-generated instruction responses for evaluation
  • Evaluating an instruction-finetuned LLM

In previous chapters, we implemented the LLM architecture, carried out pretraining, and imported pretrained weights from external sources into our model. Then, in the previous chapter, we focused on finetuning our LLM for a specific classification task: distinguishing between spam and non-spam text messages. In this chapter, we implement the process for finetuning an LLM to follow human instructions, as illustrated in figure 7.1, which is one of the main techniques behind developing LLMs for chatbot applications, personal assistants, and other conversational tasks.

Figure 7.1 A mental model of the three main stages of coding an LLM, pretraining the LLM on a general text dataset, and finetuning it. This chapter focuses on finetuning a pretrained LLM to follow human instructions.

7.1 Introduction to instruction finetuning

7.2 Preparing a dataset for supervised instruction finetuning

7.3 Organizing data into training batches

7.4 Creating data loaders for an instruction dataset

7.5 Loading a pretrained LLM

7.6 Finetuning the LLM on instruction data

7.7 Extracting and saving responses

7.8 Evaluating the finetuned LLM

7.9 Conclusions

7.9.1 What's next?

7.9.2 Staying up to date in a fast-moving field

7.9.3 Final words

7.10 Summary