chapter seven

7 Evolving RAGOps stack

This chapter covers

The design of RAG systems
Available tools and technologies that enable a RAG system
Production best practices for RAG systems

So far, we have discussed the indexing pipeline, generation pipeline, and evaluation of a retrieval-augmented generation (RAG) system. Chapter 6 also covered some advanced strategies and techniques that are useful when building production-grade RAG systems. These strategies help improve the accuracy of retrieval and generation and, in some cases, reduce the system latency. With all this information, you should be able to stitch together a RAG system for your use cases. Chapter 2 briefly laid out the design of a RAG system. This chapter elaborates on that design.

A RAG system is composed of standard application layers, as well as layers specific to generative AI applications. Stacked together, these layers create a robust RAG system.

These layers are supported by a technology infrastructure. We delve into these layers and the available technologies and tools offered by popular service providers that can be used in crafting a RAG system. Some providers have started offering managed end-to-end RAG solutions, which we touch upon in this chapter.

We wrap up the chapter with some learnings and best practices for putting RAG systems in production. Chapter 7 also marks the end of part 3 of the book.

7 Evolving RAGOps stack

This chapter covers

7.1 The evolving RAGOps stack

7.1.1 Critical layers

7.1.2 Essential layers

7.1.3 Enhancement layers

7.2 Production best practices

Summary

Critical layers

Essential layers

Enhancement layers

Production best practices