chapter six

6 Retrieval Augmented Generation (RAG)

This chapter covers:

Deep dive into Retrieval Augmented Generation (RAG)
Combining Encoders and Decoders to generate personalized search results
The need for chunking larger documents
Using a Vector Database instead of FAISS
Evaluating performance of search results and RAG

We’ve explored two powerful paradigms in modern AI: semantic search using encoder models and text generation using decoder models (LLMs). While each of these technologies is impressive, their true potential emerges when we combine them. This combination, known as Retrieval Augmented Generation (RAG), represents a significant advancement in building intelligent systems that are both knowledgeable and articulate.

As we discussed in Chapter 5, the fundamental challenge with large language models is their tendency to hallucinate or generate plausible sounding but potentially incorrect information. While they excel at understanding and generating human-like text, they lack access to current, verified information about specific domains. This limitation becomes particularly apparent in applications like our Travelle hotel search system, where accuracy and up-to-date information are crucial for providing valuable recommendations to users, for instance ChatGPT now offers integrated web and deep research.

6.1 Retrieval Augmented Generation (R.A.G)

6.2 Using Vector Databases: For scalability and enhancing search capabilities

6.3 Document Chunking: Overcoming Context Length Limitations, the need for chunking

6.3.1 The Technical Reality of Context Windows in Embedding Models

6.3.2 The Computational Impossibility of Extending Context

6.3.3 Why More Compute Power Doesn't Help

6.3.4 A Practical Example: The Long Document Problem

6.3.5 The Chunking Solution

6.3.6 Different Kinds of Chunking techniques

6.4 Building Research Paper RAG Tool with Qdrant

6.5 Evaluating RAG System Performance

6.5.1 Component-Level Evaluation: Dissecting the RAG Pipeline

6.5.2 The RAGAS Framework: A Comprehensive Evaluation Suite

6.6 Summary