chapter nine

9 Self-RAG: Retrieval with reflection and self-critique

 

This chapter covers

  • Moving from passive retrieval to active reasoning
  • Using reflection tokens for self-critique
  • Training generators to emulate proprietary critics
  • Controlling model behavior at inference time

By late 2023, RAG had established itself as the dominant pattern for working around the static knowledge baked into LLMs. Lewis et al. (2020) and Fusion-in-Decoder (Izacard and Grave, 2021) provided the mechanics for connecting parametric memory (the model's weights) with non-parametric memory (external vector databases). As practitioners pushed these systems from research prototypes into production, a recurring set of limitations, often called the "Naive RAG" bottlenecks, surfaced. The standard architecture was passive: it performed a vector search for every user query regardless of necessity.

That indiscriminate approach kept manufacturing the same families of failures from the Barnett et al. taxonomy in chapter 1 (table 1.1): wasted compute on queries that didn't need retrieval, polluted context windows that crowded out the answer, and ungrounded hallucinations like the one behind the Air Canada incident. We’ll walk through each failure mode and maps it onto the specific failure points Self-RAG was designed to neutralize.

9.1 The challenges of controlling retrieval and generation

9.1.1 The limitations of passive retrieval architectures

9.1.2 The evolution toward active reasoning

9.2 Reflection tokens for retrieval, relevance, and critique

9.2.1 The concept of self-reflection

9.2.2 The reflection token taxonomy

9.3 Training for self-evaluation

9.3.1 The critic-driven training pipeline

9.3.2 Inference: Tree decoding and control

9.3.3 Adaptive retrieval thresholding

9.4 Implementing prompted Self-RAG

9.4.1 Defining the graph state

9.4.2 Emulating reflection tokens

9.4.3 The control logic

9.5 Real-world applications

9.6 Where Self-RAG fits, and what replaced it

9.7 Related research: Fact-checking on top of RAG

9.8 Summary