chapter ten
10 RAPTOR: Recursive abstractive processing for tree-organized retrieval
This chapter covers
- The Semantic Flattening problem and retrieval failures
- The RAPTOR architectural pattern
- Soft clustering with Uniform Manifold Approximation and Projection (UMAP) and Gaussian Mixture Models (GMM)
- Building a recursive summarization engine
- Indexing and retrieving strategies for multi-layered trees
Imagine you are a historian tasked with writing a biography of a complex political figure. You enter the national archives expecting to find a structured system: boxes of diaries, organized correspondence, chronological reports, and thematic dossiers. Instead, you find a warehouse floor covered in millions of scraps of paper. Every document has been shredded into individual paragraphs, scattered randomly, stripped of folders, dates, and context.
If you need to answer, "What specific date was the armistice signed?" you might succeed. You can sift through the pile, scan for keywords like "armistice" and "signed," find the relevant scrap, and retrieve the fact. This is how naive RAG works: finding needles in haystacks.