8 Embeddings, Vector Databases and Retrieval

 

This chapter covers

  • How embedding models power retrieval-augmented generation systems by mapping text into vector space
  • Why embedding quality determines whether the right documents are retrieved
  • Choosing between commercial, open-source, and domain-specific embedding models based on your domain, constraints, and goals
  • Designing hybrid and multi-stage retrieval pipelines that combine dense, sparse, and reranking components for better precision
  • A hands-on walkthrough for building a hybrid retriever for an employee policy chatbot

A major healthcare company launched a chatbot designed to help patients navigate their insurance plans. It was built using Retrieval-Augmented Generation, or RAG, backed by a well-trained LLM and connected to internal policy documents (figure 8.1). On paper, everything looked solid. The system had access to accurate information, and the model could generate fluent, helpful responses.

Figure 8.1 Document embedding in RAG systems

But shortly after launch, users began reporting problems. The chatbot frequently failed to answer questions that should have been easy. When someone asked whether physical therapy after surgery was covered, the model gave either a vague reply or an outright “I don’t know”—even though the answer was clearly stated in the documents it had access to. In some cases, it even gave outdated or incorrect information.

8.1 What are embeddings (really)?

8.1.1 A mental model: Embeddings as coordinates in meaning space

8.1.2 Why the model matters

8.2 Embedding models in production

8.2.1 Commercial models: Powerful but opaque

8.2.2 Open-source models: Flexible and transparent

8.2.3 Domain-specific models: Purpose-built precision

8.2.4 How to choose the right model

8.2.5 Practical applications across industries

8.2.6 Case study: Learning from Airbnb's embedding journey

8.3 Beyond simple vectors: Hybrid and multi-stage retrieval

8.3.1 The limitations of pure vector search

8.3.2 Hybrid retrieval: Combining dense and sparse approaches

8.3.3 Multi-stage retrieval: Building precision through layers

8.3.4 Building a hybrid retriever: Hands-on implementation for an employee chatbot

8.4 Essential retrieval optimizations with embeddings and RAG

8.4.1 Metadata filtering: Beyond pure semantic search

8.4.2 Chunking strategies: Finding the right granularity

8.5 Exploring vector storage and databases

8.5.1 HNSW: The algorithm powering modern vector search

8.5.2 FAISS: Industry-standard vector indexing

8.5.3 Vector databases: Complete vector storage solutions

8.5.4 Vector database project: Building a healthcare policy assistant with pinecone

8.6 Practical challenges: Drift and compression

8.6.1 Compression: Getting more from less

8.7 Summary

8.8 References