chapter seven

7 Retrieval-augmented generation: The secret weapon

This chapter covers

Concepts of retrieval-augmented generation
Benefits of the RAG architecture in conjunction with large language models
Understanding the role of vector databases and indexes in implementing RAG
Basics of vector search and understanding the distance functions
Challenges in RAG implementation and potential solutions
Different methods of chunking text for RAG

As we have seen, large language models (LLMs) are very powerful and help us achieve things that were not possible until very recently. Interestingly, LLMs capture the world’s knowledge and are available to anyone at the end of an API, anywhere in the world.

However, LLMs have a knowledge constraint: their understanding and knowledge extend up to their last training cut-off; after that date, they do not have any new information. Consequently, LLMs cannot utilize the latest information. In addition, the training corpus of LLMs does not contain any private nonpublic knowledge. Therefore, LLMs cannot operate and answer specific and proprietary questions to enterprises.

7.1 What is RAG?

7.2 RAG benefits

7.3 RAG architecture

7.4 Retriever system

7.5 Understanding vector databases

7.5.1 What is a vector index?

7.5.2 Vector search

7.6 RAG challenges

7.7 Overcoming challenges for chunking

7.7.1 Chunking strategies

7.7.2 Factors affecting chunking strategies

7.7.3 Handling unknown complexities

7.7.4 Chunking sentences

7.7.5 Chunking using natural language processing

7.8 Chunking PDFs

Summary