8 RAG-Fusion: Multi-query retrieval enhancement
This chapter covers
- The limits of single-query retrieval
- Generating diverse query variants using LLMs
- Applying Reciprocal Rank Fusion (RRF) for result merging
- Building a practical multi-query RAG pipeline
- Tuning for diversity, relevance, and cost
In the previous chapters, you've seen RAG evolve from its roots in classic information retrieval. We've traced a path from keyword-based search to neural retrieval with bi-encoders like Sentence-BERT (chapter 2), then to end-to-end trainable architectures like REALM and the original RAG model (chapters 3 and 4), and, in chapter 7, to methods like HyDE, which use an LLM to generate a single hypothetical document to improve retrieval.
We’ll continue that progression, focusing on a critical problem: retrieval coverage. Even with a perfect embedding model, a single query vector is an inherently incomplete snapshot of a user's intent. The result is a system that often answers correctly but incompletely.
Imagine you're a lead investigator trying to solve a mystery, and you can't afford to miss any crucial evidence. Instead of sending one detective out with a single question, you dispatch a whole team. One knocks on doors and interviews neighbors, another digs through old newspapers, and a third pulls security camera footage. Each detective comes back with their best leads, and you combine the strongest clues from everyone into a clearer picture of what happened.