chapter five

5 Fusion-in-Decoder for multi-document processing

This chapter covers

Limitations of RAG systems when processing many documents
How FiD's "independent encoding, fused decoding" architecture solves this problem
Practical implementation of core FiD logic
Why FiD is a critical step toward advanced multi-document synthesis

Canonical and Naive RAG provide a powerful framework for grounding language models in external knowledge. However, imagine a financial analyst asking, "What were the primary drivers of Apple's revenue growth last year?" A complete answer requires synthesizing a market report, a competitor overview, and a shareholder letter. RAG retrieves the documents but often fails to combine them, treating each passage in isolation.

This approach faces a limitation: it depends on the initial retriever's accuracy. The system fails if the necessary information isn't perfectly ranked in the top one or two documents. What happens when the complete answer isn't in just one or two top-ranked passages, but is scattered in pieces across a larger set of top-k documents (e.g., k=20 or k=50)?

5.1 The challenges of multiple document contexts

5.2 Independent encoding, fused decoding

5 Fusion-in-Decoder for multi-document processing

This chapter covers

5.1 The challenges of multiple document contexts

5.2 Independent encoding, fused decoding

5.3 Scaling to hundreds of documents

5.3.1 Scaling training

5.4 Implementing FiD with transformer models

5.5 Case study: Finding the FiD sweet spot

5.6 Applications and performance characteristics

5.6.1 FiD's evolution

5.7 Summary