10 Introducing customized LLMs

This chapter covers

  • How a lack of context affects an LLM’s performance
  • How RAG works and its value
  • How the fine-tuning of LLMs works and its value
  • Comparing RAG and fine-tuning approaches

Over the past few chapters, we saw how to hone our skills to identify distinct, focused tasks that large language models (LLMs) can support. Combined with a range of prompt-engineering techniques, we’ve been successful in getting LLMs to return responses that are valuable for our testing activities. However, despite the lessons we learned, the responses we receive might still not be completely aligned with our needs and context. Although it would be foolish to think that we can completely align an LLM with our context, there are more advanced options that can be utilized along with prompt engineering to further maximize the output of an LLM in support of our testing. So, in this final part, we’re going to examine ways in which we can enhance LLMs so that they can become more embedded in our context, specifically focusing on retrieval-augmented generation (RAG) and fine-tuning. But before we dig into the specific details and actions of how these approaches work, we’ll first examine why more commonly used LLMs such as ChatGPT, Claude, and Gemini may struggle to tune to our context and then slowly familiarize ourselves with the more advanced topics of RAG and fine-tuning, comparing them to determine which one is more suitable in a given situation.

10.1 The challenge with LLMs and context

10.1.1 Tokens, context windows, and limitations

10.1.2 Embedding context as a solution

10.2 Embedding context further into prompts and LLMs

10.2.1 RAG

10.2.2 Fine-tuning LLMs

10.2.3 Comparing the two approaches

10.2.4 Combining RAG and fine-tuning

Summary