2 RAG systems and their design

 

This chapter covers

  • The concept and design of RAG systems
  • An overview of the indexing pipeline
  • An overview of the generation pipeline
  • An initial look at RAG evaluation
  • A high-level look at the RAG operations stack

The first chapter explored the core principles behind retrieval-augmented generation (RAG) and the large language model (LLM) challenges addressed by it. To construct a RAG system, several components need to be assembled. This process includes the creation and maintenance of the non-parametric memory, or a knowledge base, for the system. Another pipeline facilitates real-time interaction by sending the prompts to and accepting the response from the LLM, with retrieval and augmentation steps in the middle. Evaluation is yet another critical component, ensuring the effectiveness and accuracy of the system. All these components are supported by layers of the operations stack.

2.1 What does a RAG system look like?

2.2 Design of RAG systems

2.3 Indexing pipeline

2.4 Generation pipeline

2.5 Evaluation and monitoring

2.6 The RAGOps Stack

2.7 Caching, guardrails, security, and other layers

Summary