This chapter covers
- The design of RAG systems
- Available tools and technologies that enable a RAG system
- Production best practices for RAG systems
So far, we have discussed the indexing pipeline, generation pipeline, and evaluation of a retrieval-augmented generation (RAG) system. Chapter 6 also covered some advanced strategies and techniques that are useful when building production-grade RAG systems. These strategies help improve the accuracy of retrieval and generation and, in some cases, reduce the system latency. With all this information, you should be able to stitch together a RAG system for your use cases. Chapter 2 briefly laid out the design of a RAG system. This chapter elaborates on that design.
A RAG system is composed of standard application layers, as well as layers specific to generative AI applications. Stacked together, these layers create a robust RAG system.
These layers are supported by a technology infrastructure. We delve into these layers and the available technologies and tools offered by popular service providers that can be used in crafting a RAG system. Some providers have started offering managed end-to-end RAG solutions, which we touch upon in this chapter.
We wrap up the chapter with some learnings and best practices for putting RAG systems in production. Chapter 7 also marks the end of part 3 of the book.
By the end of this chapter, you should