chapter three

3 External knowledge and retrieval

This chapter covers

External knowledge as a source of context beyond an LLM’s training data
Retrieval-augmented generation (RAG) to encode and query external knowledge
Knowledge graphs to structure and retrieve (graph RAG) external knowledge
Advanced retrieval approaches, including hybrid, agentic, and vectorless RAG
Cache-augmented generation (CAG) to preload external knowledge
Context stuffing for direct injection of external knowledge

Chapter 1 introduced external knowledge as one of the six foundational sources of context in LLM-based systems. This chapter examines how LLMs leverage this context source to expand their internal knowledge beyond their training data and produce responses grounded in real-time or domain-specific information. One of the most popular ways to incorporate external knowledge is through retrieval techniques, implemented through workflows such as retrieval-augmented generation (RAG) and its more advanced variants (graph-based, hybrid, and agentic RAG). To improve the efficiency of these retrieval workflows, caching techniques such as cache-augmented generation can be used. Finally, context injection techniques (the so-called context stuffing) can be employed to place external knowledge directly into the context window.

3.1 External knowledge

3.2 Retrieval-augmented generation

3.2.1 The indexing pipeline

3.2.2 The generation pipeline

3.2.3 Chunking strategies

3.2.4 Retrieval models and vector databases

3.2.5 RAG tools

3.3 Knowledge graphs

3.3.1 Graph structure and modeling

3.3.2 Graph RAG

3.4 Hybrid RAG

3.5 Agentic RAG

3.6 Vectorless RAG

3.7 Cache-augmented generation

3.8 Context stuffing

3.9 Hands-on

3.9.1 A minimal RAG implementation with an OpenAI model

3.9.2 RAG with an open-source generation model

3.9.3 Fully local RAG system

3.9.4 RAG with RAGFlow

3.9.5 Agentic RAG with LangChain

3.9.6 Vectorless RAG with PageIndex