chapter eight

8 Advanced indexing

This chapter covers

Advanced RAG techniques for more effective retrieval
Selecting the optimal chunk splitting strategy for your use case
Using multiple embeddings to enhance coarse chunk retrieval
Expanding granular chunks to add context during retrieval
Indexing strategies for semi-structured and multi-modal content

In Chapter 7, you explored the fundamentals of the RAG (Retrieval-Augmented Generation) architecture—a core pattern for building LLM-powered applications. To keep things simple, we worked with a stripped-down version. That minimal setup is useful for learning, but in practice it often leads to disappointing results: inaccurate answers, overlooked data, or weak use of context, even when the vector store contains exactly what you need. These issues usually stem from vague queries, suboptimal indexing, or failing to leverage metadata effectively.

This chapter focuses on how to overcome those challenges. Building robust LLM applications with LangChain is less about wiring components together and more about refining the design—iterating on retrieval strategies, experimenting with prompts, and applying advanced RAG techniques. True proficiency comes from mastering these refinements.

We’ll begin with advanced indexing strategies, such as creating multiple embeddings for larger text chunks in the vector database. This approach improves retrieval precision and ensures richer, more accurate context for response generation.

8.1 Improving RAG Accuracy

8.1.1 Content Ingestion Stage

8.1.2 Question Answering Stage

8.2 Advanced Document Indexing

8 Advanced indexing

This chapter covers

8.1 Improving RAG Accuracy

8.1.1 Content Ingestion Stage

8.1.2 Question Answering Stage

8.2 Advanced Document Indexing

8.3 Splitting Strategy

8.3.1 Splitting by HTML Header

8.4 Embedding Strategy

8.4.1 Embedding Child Chunks with ParentDocumentRetriever

8.4.2 Embedding Child Chunks with MultiVectorRetriever

8.4.3 Embedding Document Summaries

8.4.4 Embedding Hypothetical Questions

8.5 Granular Chunk Expansion

8.6 Semi-Structured Content

8.7 Multi-Modal RAG

8.8 Summary