chapter six

6 Generate answers with Retrieval Augmented Generation (RAG)

This chapter covers

Enhancing chatbot responses without coding intents
Improving weak understanding with retrieval augmented generation (RAG)
Evaluating the advantage of using RAG over traditional search models
Selecting the proper RAG technique(s) for your conversational AI
Assessing and improving the performance of RAG in your Conversational AI systems

In previous chapters, we have seen the “chatbot doesn’t understand” pain point for question-answering bots. We first addressed it by helping the chatbot understand more intents. But at some point, there are diminishing returns to this strategy. Uncommon questions from the “long tail” may never make sense as intents. This chapter introduces ways to handle that “long tail,” including search and retrieval augmented generation (RAG). These are great methods to improve a chatbot’s weak understanding.

We concluded with advice on when to avoid adding new intents, especially when dealing with diverse, infrequent domain-related issues. In this chapter, we introduce adding search capabilities to improve weak understanding.

Both search and RAG allow you to improve a chatbot by adding data and documents without programming new intents. This allows you to serve thousands of intents with the simplicity of training just a few. The answers provided by these methods are more straightforward to change—just change the documents rather than changing your chatbot.

6.1 Beyond Intents: The Role of Search in Conversational AI

6.1.1 Using search in conversational AI

6.1.2 Benefits of traditional search

6.1.3 Drawbacks of traditional search

6.2 Beyond Search: Generate answers with RAG

6.2.1 Using RAG in Conversational AI

6.2.2 Benefits of RAG

6.2.3 Combine RAG with other genAI use cases

6.2.4 Comparing intents, search, and RAG approaches

6.2.5 Exercises

6.3 How is RAG implemented?

6.3.1 High-level implementation

6.3.2 Preparing your document repository for RAG

6.3.3 Exercises

6.4 Additional considerations of RAG implementations

6.4.1 Can’t we just use an LLM directly?

6.4.2 Keeping answers current and relevant with RAG

6.4.3 How easy is it to set up the ingestion pipeline?

6.4.4 Handling Latency

6.4.5 When should you use a fallback mechanism, and when should you search?

6.5 Evaluation/Analytics of RAG

6.5.1 Indexing Metrics

6.5.2 Retrieval Metrics

6.5.3 Generation Metrics

6.5.4 Comparing Efficiency of Indexing and Embedding Solutions for RAG

6.5.5 Exercises

6.6 Summary