chapter four

4 Building Responses and Self-Hosted Clients

 

This chapter covers

  • Using the Responses API for single and background calls
  • Creating ResponsesClient for OpenAI and for Azure OpenAI
  • Running Robby on self-hosted Ollama and ONNX models
  • Handling failures with finish reasons and exceptions

We already introduced Robby as a simple agent. Now we focus on how Robby talks to models in production. The Responses API extends traditional chat interfaces with server-managed state and background processing, making it easier to handle long-running AI tasks without blocking your application. Self-hosted clients give you complete control over the model infrastructure, running locally through Ollama or embedded in your process via ONNX.

4.1 What is Responses API?

4.1.1 Introduction to ResponsesClient

4.2 Creating and Configuring ResponsesClient

4.2.1 When to use Responses API: Use it for long-running tasks that may exceed typical timeout limits, scenarios where service-managed state simplifies your architecture, background processing with polling patterns, or when you want reduced boilerplate for common conversational workflows. OpenAI Responses Client

4.2.2 OpenAI Responses Client with Background Response

4.2.3 Azure OpenAI Responses Client

4.3 Creating and Configuring Self-Hosted Clients

4.3.1 Ollama Client

4.3.2 ONNX Client

4.4 Exception Management and Error Handling

4.4.1 Exception Types and Scenarios

4.4.2 Best Practices for Exception Handling

4.5 Summary