5 Crafting Agents from Scratch
This chapter covers
- From chatbots to agents
- Building a ChatCompletionAgent
- Configuring agent instructions, output format, streaming
- Working with agent sessions
With a conventional chatbot that powers Robby, we can generate smart answers, but each response is isolated. Robby can reason about driving, describe his surroundings, or read a weather report, one question at a time, with no real memory and no built-in way to act in the physical world. We can manually carry chat history around and pass extra options to the model, but that is plumbing we build ourselves, not a first-class abstraction. This approach starts to break down once we handle dynamic, multi-step problems that need persistent context, tool calls, and coordinated reasoning across several steps or several specialized components.
Now consider a single objective such as safely traversing difficult terrain. That is not one prompt and one answer. It is a chain of decisions, checks, and actions: read sensors, choose a path, adjust speed, re-check safety, and so on. At that point we do not want Robby to behave like a stateless completion engine. We want him to have memory across turns, tools he can call, and the ability to work in a team of focused agents.