chapter fifteen

15 Production RAG: Metrics, agentic systems, and continuous improvement

This chapter covers

Measuring RAG stability and scalability with RAGGED
Building an agentic RAG with reflection and planning
Unifying fine-tuning and retrieval in a single model
Self-improving pipelines that learn from experience
Mapping failure points to RAG technique selection

Throughout this book, we have traced RAG's evolution from its information retrieval roots through foundational models (REALM, RAG), multi-document fusion (FiD, Atlas), query enhancement (HyDE, RAG-Fusion), adaptive retrieval (Self-RAG, FLARE, CRAG), graph-based approaches, context compression, and systematic evaluation. Each technique addressed specific failure points: missing content, missed rankings, hallucination, context loss, and incomplete extraction.

What we have not yet addressed is how these techniques compose into production systems, how to choose between them, and where the field is heading. That last question is more grounded than it might sound. The research trajectory is clear: RAG systems are becoming more autonomous, self-aware, and tightly coupled with the LLMs they serve.

15.1 The evolving RAG landscape

15.1.1 More retrieval is not always better

15.1.2 Measuring stability and scalability

15.1.3 The reader is the bottleneck

15.2 Agentic RAG: Autonomous retrieval systems

15.2.1 Four design patterns

15.2.2 From patterns to pipelines

15.2.3 Practical implementation

15.2.4 When agentic RAG helps and when it does not

15.3 Fine-tuning in the RAG lifecycle

15.3.1 RankRAG: Unifying ranking and generation

15.3.2 Results and implications

15.3.3 Where fine-tuning fits in the RAG lifecycle

15.4 Self-improving RAG: The feedback loop

15.4.1 The gatekeeper: Learning when to trust retrieval

15.4.2 The critic: Learning which memories to trust

15.4.3 The student: Learning to teach itself to use retrieval

15.4.4 Building the feedback loop

15.5 Practical frameworks for RAG technique selection

15.5.1 Complexity versus performance

15.5.2 Mapping failure modes to RAG approaches

15.5.3 Decision tree for new projects

15.6 Best practices for production deployment

15.6.1 Instrument everything

15.6.2 Version your pipeline as a unit

15.6.3 RAG framework comparison

15.6.4 The testing pyramid for RAG

15.7 Case study: Agentic customer support in fintech

15.7.1 The problem