chapter eight

8 RAG-Fusion: Multi-query retrieval enhancement

 

This chapter covers

  • The limits of single-query retrieval
  • Generating diverse query variants using LLMs
  • Applying Reciprocal Rank Fusion (RRF) for result merging
  • Building a practical multi-query RAG pipeline
  • Tuning for diversity, relevance, and cost

In the previous chapters, you've seen RAG evolve from its roots in classic information retrieval. We've traced a path from keyword-based search to neural retrieval with bi-encoders like Sentence-BERT (chapter 2), then to end-to-end trainable architectures like REALM and the original RAG model (chapters 3 and 4), and, in chapter 7, to methods like HyDE, which use an LLM to generate a single hypothetical document to improve retrieval.

We’ll continue that progression, focusing on a critical problem: retrieval coverage. Even with a perfect embedding model, a single query vector is an inherently incomplete snapshot of a user's intent. The result is a system that often answers correctly but incompletely.

Imagine you're a lead investigator trying to solve a mystery, and you can't afford to miss any crucial evidence. Instead of sending one detective out with a single question, you dispatch a whole team. One knocks on doors and interviews neighbors, another digs through old newspapers, and a third pulls security camera footage. Each detective comes back with their best leads, and you combine the strongest clues from everyone into a clearer picture of what happened.

8.1 The coverage problem

8.2 LLM-driven query generation

8.2.1 Generating queries with LangChain

8.3 The Reciprocal Rank Fusion algorithm

8.4 Implementing multi-query RAG systems

8.5 Case study: Enhancing e-commerce product search

8.6 Balancing diversity and precision

8.7 Summary