chapter eleven

11 Contextualizing prompts with retrieval-augmented generation

This chapter covers

How RAG works
Using tooling to create a basic RAG setup
Integrating vector databases into a RAG setup

As we learned in the previous chapter, one of the challenges of working with large language models (LLMs) is that they lack visibility of our context. In the second part of this book, we saw different ways in which we can arrange our prompts to help provide small insights into our context. However, these types of prompts are only useful before the lack of extra context leads to less valuable responses. Therefore, to increase the value of an LLM’s response, we need to place more contextual detail into our prompt. In this chapter, we’ll explore how to do this through retrieval-augmented generation, or RAG. We’ll learn how RAG works, why it’s beneficial, and how it’s not a big jump from prompt engineering to building our own RAG framework examples to establish our understanding of how they can help us in a testing context.

11.1 Extending prompts with RAG

To recap, RAG is an approach to improving the quality of an LLM’s response by combining existing corpus of data with a prompt. Although this broadly explains how RAG works, we need to dig a little deeper to better grasp how this combination of data is achieved. The process of a RAG system is relatively straightforward and can be summarized as shown in figure 11.1.

Figure 11.1 A visualization of how a basic RAG system works

11 Contextualizing prompts with retrieval-augmented generation

This chapter covers

11.1 Extending prompts with RAG

Figure 11.1 A visualization of how a basic RAG system works

11.2 Building a RAG setup

11.2.1 Building our RAG framework

11.2.2 Testing our RAG framework

11.3 Enhancing data storage for RAG

11.3.1 Working with Vector databases

11.3.2 Setting up a vector-database-backed RAG

11.3.3 Testing a Vector-database-backed RAG framework

11.3.4 Going forward with RAG frameworks

Summary