6 Atlas: Few-shot learning with retrieval augmentation
This chapter covers
- The trade-offs between internal and external knowledge
- How smaller models can outperform larger ones using external knowledge
- How to train the retriever based on the reader's performance
- The Perplexity Distillation (PDist) algorithm for joint training
- How Atlas achieved state-of-the-art few-shot learning performance
In the early 2020s, the field of Natural Language Processing was largely defined by the principle of scaling laws. Research from institutions like OpenAI and DeepMind demonstrated that increasing a model's parameter count, along with the volume of its training data, led to the emergence of impressive new capabilities, particularly in few-shot learning. Few-shot learning, or more generally N-shot learning, refers to a setup where a model performs a new task from just N worked examples placed directly in its prompt. With N=0 (zero-shot), the prompt contains only an instruction, e.g., "Translate to French: hello." With N=2, the prompt first shows two solved examples ("English: hello → French: bonjour. English: dog → French: chien.") and then asks for the next answer.