chapter four

4 The Session Service: Teaching your AI to remember

 

This chapter covers

  • Designing the Session Service to store and retrieve conversation history
  • Understanding what a session contains: messages, roles, tool calls, and metadata
  • Defining the gRPC contract that other services use to interact with sessions
  • Abstracting session storage so teams can choose their backend
  • Implementing a complete PostgreSQL backend for session persistence
  • Connecting the Session Service to the SDK
  • Extending the Session Service with model-managed memory
  • Managing context windows with strategies like summarization, hierarchical memory, and retrieval-augmented approaches

In this chapter we will build the Session Service, one half of what Chapter 1 called "context-aware intelligence" (the Data Service, which handles organizational knowledge, is the other half). The Session Service provides conversation memory: the ability to remember what's been said so that follow-up questions make sense and the assistant can reference earlier parts of the conversation. This capability transforms a stateless AI system into something genuinely useful. When a patient asks, "What documents do I need?" and then follows up with "What about for my child?", the assistant understands that "what" refers to documents because it remembers the previous exchange.

4.1 What a session contains

4.2 The session service contract

4.3 Storage abstraction

4.3.1 Choosing the right database

4.3.2 The storage interface

4.4 Implementing the Session Service backend

4.4.1 The Database Schema

4.4.2 The Storage Implementation

4.4.3 The gRPC service implementation

4.5 Integrating with the SDK

4.6 Model-managed memory

4.6.1 Where memory lives

4.6.2 The memory model

4.6.3 Extending the storage interface for memories

4.6.4 SDK Integration

4.6.5 A complete workflow example

4.7 Managing context windows

4.7.1 The token budget problem

4.7.2 Simple truncation

4.7.3 Compressing history with summarization

4.7.4 Hierarchical memory

4.7.5 Retrieval-augmented memory

4.7.6 Putting it together

4.8 Summary