This chapter covers
- Retrievers and retrieval methodologies
- Augmentation using prompt engineering techniques
- Generation using LLMs
- Basic implementation of the RAG pipeline in Python
In chapter 3, we discussed the creation of the knowledge base, or the non-parametric memory of retrieval augmented generation (RAG)-based applications, via the indexing pipeline. To use this knowledge base for accurate and contextual responses, we need to create a generation pipeline that includes the steps of retrieval, augmentation, and generation.
This chapter elaborates on the three components of the generation pipeline. We begin by discussing the retrieval process, which primarily involves searching through the embeddings stored in vector databases of the knowledge base and returning a list of documents that closely match the input query of the user. You will also learn about the concept of retrievers and a few retrieval algorithms. Next, we move to the augmentation step. At this point, it is also beneficial to understand different prompt engineering frameworks used with RAG. Finally, as part of the generation step, we discuss a few stages of the LLM life cycle, such as using foundation models versus supervised fine-tuning, models of different sizes, and open source versus proprietary models in the RAG context. In each of these steps, we also highlight the benefits and drawbacks of different methods.