chapter eight

8 Embeddings, Vector Databases and Retrieval

This chapter covers

How embedding models power retrieval-augmented generation systems by mapping text into vector space
Why embedding quality determines whether the right documents are retrieved
Choosing between commercial, open-source, and domain-specific embedding models based on your domain, constraints, and goals
Designing hybrid and multi-stage retrieval pipelines that combine dense, sparse, and reranking components for better precision
A hands-on walkthrough for building a hybrid retriever for an employee policy chatbot

A major healthcare company launched a chatbot designed to help patients navigate their insurance plans. It was built using Retrieval-Augmented Generation, or RAG, backed by a well-trained LLM and connected to internal policy documents (figure 8.1). On paper, everything looked solid. The system had access to accurate information, and the model could generate fluent, helpful responses.

Figure 8.1 Document embedding in RAG systems

But shortly after launch, users began reporting problems. The chatbot frequently failed to answer questions that should have been easy. When someone asked whether physical therapy after surgery was covered, the model gave either a vague reply or an outright “I don’t know”—even though the answer was clearly stated in the documents it had access to. In some cases, it even gave outdated or incorrect information.

8.1 What are embeddings (really)?

8.1.1 A mental model: Embeddings as coordinates in meaning space

8.1.2 Why the model matters

8.2 Embedding models in production

8.2.1 Commercial models: Powerful but opaque

8.2.2 Open-source models: Flexible and transparent

8.2.3 Domain-specific models: Purpose-built precision

8.2.4 How to choose the right model

8.2.5 Embeddings across industries

8.2.6 Case study: Learning from Airbnb's embedding journey

8.3 Beyond simple vectors: Hybrid and multi-stage retrieval

8.3.1 The limitations of pure vector search

8.3.2 Hybrid retrieval: Combining dense and sparse approaches

8.3.3 Multi-stage retrieval: Building precision through layers

8.3.4 Building a hybrid retriever: Hands-on implementation for an employee chatbot

8.4 Essential retrieval optimizations with embeddings and RAG

8.4.1 Metadata filtering: Beyond pure semantic search

8.4.2 Chunking strategies: Finding the right granularity

8.5 Exploring vector storage and databases

8.5.1 HNSW: The algorithm powering modern vector search

8.5.2 FAISS: Industry-standard vector indexing

8.5.3 Vector databases: Complete vector storage solutions

8.5.4 Vector database project: Building a healthcare policy assistant with pinecone

8.6 Practical challenges: Drift and compression

8.6.1 Diving deeper into vector compression

8.7 Summary

8.8 References