15 Production RAG: Metrics, agentic systems, and continuous improvement
This chapter covers
- Measuring RAG stability and scalability with RAGGED
- Building an agentic RAG with reflection and planning
- Unifying fine-tuning and retrieval in a single model
- Self-improving pipelines that learn from experience
- Mapping failure points to RAG technique selection
Throughout this book, we have traced RAG's evolution from its information retrieval roots through foundational models (REALM, RAG), multi-document fusion (FiD, Atlas), query enhancement (HyDE, RAG-Fusion), adaptive retrieval (Self-RAG, FLARE, CRAG), graph-based approaches, context compression, and systematic evaluation. Each technique addressed specific failure points: missing content, missed rankings, hallucination, context loss, and incomplete extraction.
What we have not yet addressed is how these techniques compose into production systems, how to choose between them, and where the field is heading. That last question is more grounded than it might sound. The research trajectory is clear: RAG systems are becoming more autonomous, self-aware, and tightly coupled with the LLMs they serve.
15.1 The evolving RAG landscape
15.1.1 More retrieval is not always better
15.1.2 Measuring stability and scalability
15.1.3 The reader is the bottleneck
15.2 Agentic RAG: Autonomous retrieval systems
15.2.1 Four design patterns
15.2.2 From patterns to pipelines
15.2.3 Practical implementation
15.2.4 When agentic RAG helps and when it does not
15.3 Fine-tuning in the RAG lifecycle
15.3.1 RankRAG: Unifying ranking and generation
15.3.2 Results and implications
15.3.3 Where fine-tuning fits in the RAG lifecycle
15.4 Self-improving RAG: The feedback loop
15.4.1 The gatekeeper: Learning when to trust retrieval
15.4.2 The critic: Learning which memories to trust
15.4.3 The student: Learning to teach itself to use retrieval
15.4.4 Building the feedback loop
15.5 Practical frameworks for RAG technique selection
15.5.1 Complexity versus performance
15.5.2 Mapping failure modes to RAG approaches
15.5.3 Decision tree for new projects
15.6 Best practices for production deployment
15.6.1 Instrument everything
15.6.2 Version your pipeline as a unit
15.6.3 RAG framework comparison
15.6.4 The testing pyramid for RAG
15.7 Case study: Agentic customer support in fintech
15.7.1 The problem