4 Retrieval-augmented generation for knowledge tasks
This chapter covers
- How RAG combines pretrained retrieval and generation components
- Using multiple documents with top-k retrieval
- Choosing between RAG-Sequence and RAG-Token
- Building a complete, working RAG pipeline
- Why RAG became the standard for enterprise AI
Most developers today are familiar with naive RAG: retrieve a few documents, insert them into a prompt, and hope the LLM generates the correct answer. But the original 2020 paper by Patrick Lewis et al. proposed a much more robust approach. They introduced canonical RAG, a probabilistic framework that synthesizes answers by weighing evidence from multiple sources. To build reliable systems today, it’s important to understand this distinction.
Published in May 2020, just three months after REALM demonstrated end-to-end retrieval training, Patrick Lewis and his colleagues at Facebook AI, UCL, and NYU asked whether a modular system—combining pretrained dense retrieval with pretrained generation—could outperform both massive parametric models (like T5-11B) and end-to-end trained systems (like REALM) on knowledge-intensive tasks.