chapter four
4 Building Responses and Self-Hosted Clients
This chapter covers
- Using the Responses API for single and background calls
- Creating ResponsesClient for OpenAI and for Azure OpenAI
- Running Robby on self-hosted Ollama and ONNX models
- Handling failures with finish reasons and exceptions
We already introduced Robby as a simple agent. Now we focus on how Robby talks to models in production. The Responses API extends traditional chat interfaces with server-managed state and background processing, making it easier to handle long-running AI tasks without blocking your application. Self-hosted clients give you complete control over the model infrastructure, running locally through Ollama or embedded in your process via ONNX.