chapter five

5 How do we constrain the behavior of LLMs?

This chapter covers

Constraining LLM behavior to make them more useful
The four areas where we can constrain LLM behavior
How fine-tuning allows us to update LLMs
How reinforcement learning can change the output of LLMs
Modifying the inputs of an LLM using retrieval augmented generation

It may seem counterintuitive that you can make a model more useful by controlling the output the model is allowed to produce, but it is almost always necessary when working with LLMs. This control is necessitated by the fact that when presented with an arbitrary text prompt, an LLM will attempt to generate what it believes to be an appropriate response, regardless of its intended use. Consider a chatbot helping a customer buy a car; you do not want the LLM going off-script and talking to them about athletics or sports just because they asked something related to taking the vehicle to their kid’s soccer games.

In this chapter, we will discuss in more detail why you would want to limit, or constrain, the output an LLM produces and the nuances associated with such constraints. Accurately constraining an LLM is one of the hardest things to accomplish because of the nature of how LLMs are trained to complete input based on what they observe in training data. Currently, there are no perfect solutions. We will discuss the four potential places where an LLM’s behavior can be modified:

5.1 Why do we want to constrain behavior?

5.1.1 Base models are not very usable

5.1.2 Not all model outputs are desirable

5.1.3 Some cases require specific formatting

5.2 Fine-tuning: The primary method of changing behavior

5.2.1 Supervised fine-tuning

5.2.2 Reinforcement learning from human feedback

5.2.3 Fine-tuning: The big picture

5.3 The mechanics of RLHF

5.3.1 Beginning with a naive RLHF

5.3.2 The quality reward model

5.3.3 The similar-but-different RLHF objective

5.4 Other factors in customizing LLM behavior

Summary