chapter five

5 Crafting Agents from Scratch

 

This chapter covers

  • From chatbots to agents
  • Building a ChatCompletionAgent
  • Configuring agent instructions, output format, streaming
  • Working with agent sessions

With a conventional chatbot that powers Robby, we can generate smart answers, but each response is isolated. Robby can reason about driving, describe his surroundings, or read a weather report, one question at a time, with no real memory and no built-in way to act in the physical world. We can manually carry chat history around and pass extra options to the model, but that is plumbing we build ourselves, not a first-class abstraction. This approach starts to break down once we handle dynamic, multi-step problems that need persistent context, tool calls, and coordinated reasoning across several steps or several specialized components.

Now consider a single objective such as safely traversing difficult terrain. That is not one prompt and one answer. It is a chain of decisions, checks, and actions: read sensors, choose a path, adjust speed, re-check safety, and so on. At that point we do not want Robby to behave like a stateless completion engine. We want him to have memory across turns, tools he can call, and the ability to work in a team of focused agents.

5.1 Introducing Agents

5.1.1 What is ChatClientAgent?

5.1.2 Instructions

5.1.3 Streaming

5.1.4 Structured Output

5.2 Chat Messages

5.2.1 Conversations with Chat Messages

5.2.2 Multi-Modal Input

5.3 Agent Session

5.3.1 What is ChatClientAgentSession?

5.3.2 Multiple Turns on the Same Session

5.3.3 Session Serialization / Deserialization

5.3.4 Multiple Agents on the Same Session

5.3.5 Agents with Background Responses

5.4 Conversations API

5.4.1 In-Service Conversation

5.5 Conclusion

5.6 Summary