chapter four

4 Advancing trust & minimizing hallucinations with retrieval augmented generation

This chapter covers

Retrieval augmented generation (RAG) and how it overcomes limitations of stand-alone LLMs
Core components of a RAG architecture - retrievers, generators, orchestrators
Indexing and structuring knowledge sources to enable relevant passage retrieval
Building sample RAG systems with LangChain for simplifying orchestration

In our prior chapter, a world of possibilities opened in constructing conversational AI through prompting - carefully crafting input texts to large language models (LLMs) to shape helpful, eloquent chatbot responses. However, despite the disruptive potential, major gaps remain in flexibility for real-world assistance.

Figure 4.1 Basic prompting through interacting with LLM directly. [1]

With basic prompting (shown in figure 4.1), LLMs have no direct means to access live external data streams beyond their training corpora. However, allowing chatbots to incorporate dynamic knowledge is crucial, as we see below for our retail ecommerce chatbot. Could prompting alone enable a shopper asking:

Figure 4.2 Asking an LLM (Claude 2) about a specific question related to inventory systems

4.1 What is retrieval augmented generation?

4.2 RAG system architecture

4.3 Reducing hallucinations with RAG

4.4 Data preparation and reliable indexing for RAG systems

4.4.1 Introduction to LangChain

4.4.2 Structuring product catalogs

4.4.3 Creating the searchable vector index

4.4.4 Implementing a FAISS Index

4.5 Building an effective RAG system

4.5.1 Processing user queries

4.5.2 Crafting an accurate response using an LLM

4.6 Evaluating and optimizing RAG systems

4.6.1 The role of evaluator LLMs in assessing hallucinations

4.6.2 Crafting evaluation datasets

4.6.3 Practical metrics for RAG evaluation

4.6.4 Reducing hallucinations by filtering retrievals and leveraging metadata

4.6.5 Embedding chunk sizes and model choice

4.7 Advanced techniques for improving RAG

4.7.1 Fine-tuning embeddings for in-domain relevance

4.7.2 Incorporating metadata to re-establish context

4.7.3 Implementing hybrid retrieval

4.7.4 Enhancing queries via rephrasing and augmentation

4.7.5 Implementing query routing

4.8 Building a RAG system for an ecommerce chatbot with LangChain

4.8.1 Challenges and adaptations of the chatbot

4.9 Summary

4.10 References