chapter three

3 REALM: Birth of end-to-end trainable RAG

 

This chapter covers

  • The challenges of open-domain question answering
  • From symmetric semantic search to asymmetric, task-specific retrieval
  • REALM’s innovation: joint pre-training
  • Training a retriever with a language model’s objective

We’ve assembled the essential components of a scalable retrieval system: semantic understanding from Word2Vec, sentence embeddings from Sentence-BERT, and large-scale search using libraries like FAISS. These systems can retrieve documents, but when it comes to generating a response, they return only a list of documents, leaving the user to read, synthesize, and extract the final answer. SBERT uses contrastive learning to capture conceptual or semantic differences between pairs of sentences or texts, independently of any downstream task.

In a standard pipeline, the retriever is frozen: it selects documents based on predefined similarity rules or models (such as SBERT or keywords). It does not know whether the documents it finds actually help answer the question. This leads directly to FP2 (Missed the Top Rank): the retriever finds documents that look similar but lack the specific evidence needed.

REALM’s innovation was to unfreeze the retriever. Instead of treating retrieval and reading as separate steps, it trained them as a single, end-to-end system. The retriever learns to update its search strategy based on whether the reader successfully predicts the correct answer (instrumental utility).

3.1 Open-domain question answering

3.2 Retrieval-augmented language model pre-training

3.2.1 Test results

3.3 Practical implementation with Python

3.4 REALM's enduring impact and legacy

3.5 Summary