chapter five
5 The Data Service: teaching AI what your organization knows
This chapter covers
- Designing the Data Service to give teams searchable knowledge indexes without building their own parsing, chunking, and embedding pipelines
- Organizing knowledge into isolated indexes so teams configure retrieval independently
- Building an ingestion pipeline that detects file formats, extracts text, and chunks documents into searchable pieces
- Generating embeddings through the Model Service to reuse provider abstraction, fallback logic, and cost tracking
- Abstracting vector storage and search to support multiple backends, with a complete pgvector implementation
- Supporting hybrid retrieval by extending the vector store interface with optional keyword search
- Exposing the Data Service through the gRPC contract and platform SDK
An AI assistant that remembers your conversation but doesn't know your company's policies, products, or procedures is still going to make things up. It will hallucinate confidently about return windows, invent product features, and cite policies that don't exist. Conversational memory, which we built in Chapter 4, is only half the story. The other half is grounding: connecting AI applications to organizational knowledge so that responses reflect reality rather than plausible guesses.