chapter six

6 Enhancing responses with retrieval-augmented generation

This chapter covers

Enhancing chatbot responses without coding intents
Improving weak understanding with RAG
Evaluating the advantage of using RAG over traditional search models
Selecting the proper RAG techniques for your conversational AI
Assessing and improving the performance of RAG in your conversational AI systems

In previous chapters, we saw the “chatbot doesn’t understand” pain point for question-answering bots. We first addressed it by helping the chatbot understand more intents, but at some point there are diminishing returns to this strategy. Uncommon questions from the “long tail” may never make sense to implement as intents. This chapter introduces ways to handle that “long tail,” including search and retrieval-augmented generation (RAG). These are great methods for improving a chatbot’s weak understanding.

We concluded chapter 5 with advice on when to avoid adding new intents, especially when dealing with diverse, infrequent domain-related problems. In this chapter, we’ll add search capabilities to improve weak understanding.

Both search and RAG allow you to improve a chatbot by adding data and documents without programming new intents. This allows you to serve thousands of intents with the simplicity of training just a few. The answers provided by these methods are more straightforward to change—just change the documents rather than changing your chatbot.

6.1 Beyond intents: The role of search in conversational AI

6.1.1 Using search in conversational AI

6.1.2 Benefits of traditional search

6.1.3 Drawbacks of traditional search

6.2 Beyond search: Generating answers with RAG

6.2.1 Using RAG in conversational AI

6.2.2 Benefits of RAG

6.2.3 Combining RAG with other generative AI use cases

6.2.4 Comparing intents, search, and RAG approaches

6.3 How is RAG implemented?

6.3.1 High-level implementation

6.3.2 Preparing your document repository for RAG

6.4 Additional considerations of RAG implementations

6.4.1 Can’t we just use an LLM directly?

6.4.2 Keeping answers current and relevant with RAG

6.4.3 How easy is it to set up the ingestion pipeline?

6.4.4 Handling latency

6.5.1 Indexing metrics