5 Fusion-in-Decoder for multi-document processing
This chapter covers
- Limitations of RAG systems when processing many documents
- How FiD's "independent encoding, fused decoding" architecture solves this problem
- Practical implementation of core FiD logic
- Why FiD is a critical step toward advanced multi-document synthesis
Canonical and Naive RAG provide a powerful framework for grounding language models in external knowledge. However, imagine a financial analyst asking, "What were the primary drivers of Apple's revenue growth last year?" A complete answer requires synthesizing a market report, a competitor overview, and a shareholder letter. RAG retrieves the documents but often fails to combine them, treating each passage in isolation.
This approach faces a limitation: it depends on the initial retriever's accuracy. The system fails if the necessary information isn't perfectly ranked in the top one or two documents. What happens when the complete answer isn't in just one or two top-ranked passages, but is scattered in pieces across a larger set of top-k documents (e.g., k=20 or k=50)?