chapter five

5 How do we constrain the behavior of LLMs?

 

This chapter covers

  • Why constraining LLM behavior makes them more useful
  • The four areas where we can constrain LLM behavior
  • How fine-tuning allows us to update LLMs
  • How Reinforcement Learning can change the output of LLMs
  • How to modify the inputs of an LLM using Retrieval Augmented Generation

It may seem counter-intuitive that you can make a model more useful by controlling the output the model is allowed to produce, but it is almost always necessary when working with LLMs. This is necessitated by the fact that when presented with an arbitrary text prompt, an LLM will attempt to generate what it believes to be an “appropriate” response, regardless of its intended use. Consider a chatbot helping a customer buy a car; you do not want the LLM going “off-script” and talking to them about athletics/sports just because they asked something related to taking the vehicle to their kid’s soccer games.

In this chapter, we will discuss in more detail why you would want to limit or “constrain” the output an LLM produces and the nuances associated with such constraints. Accurately constraining an LLM is one of the hardest things to accomplish because of the nature of how LLMs are trained to complete input based on what they observe in training data. Currently, there are no perfect solutions. We will discuss the four potential places where a LLM’s behavior can be modified:

5.1 Why do we want to constrain behavior?

5.1.1 Base models are not very usable

5.1.2 Not all model outputs are desirable

5.1.3 Some cases require specific formatting

5.2 Fine tuning: the primary method of changing behavior

5.2.1 Supervised Fine-tuning

5.2.2 Reinforcement Learning from Human Feedback

5.2.3 Fine-tuning: the big picture

5.3 The mechanics of RLHF

5.3.1 Beginning with a naive RLHF

5.3.2 The quality reward model

5.3.3 The similar-but-different RLHF objective

5.4 Other factors in customizing LLM behavior

5.4.1 Altering training data

5.4.2 Altering base model training

5.4.3 Altering the Outputs

5.5 Integrating LLMs into larger workflows

5.5.1 Customizing LLMs with Retrieval Augmented Generation

5.5.2 General purpose LLM “Programing”

5.6 Summary