2 The brain of AI agents: LLMs

 

This chapter covers

  • LLM's core capabilities
  • Selecting the right LLM
  • Using LLM APIs
  • Prompt engineering techniques
  • Hands-on problem solving using a GAIA benchmark problem

Throughout this book, we’ll build a Research Agent and use it as a concrete thread to ground the concepts. In our case, the RA needs to interpret requests like “survey recent work on X” or “extract key findings from these PDFs” and make decisions like which sources to trust or when to ask clarifying questions.

Let’s turn our attention to the LLM—the "brain" of an LLM agent—shown as the core component in figure 2.1. As shown in the diagram, the LLM acts as the reasoning engine that powers the entire agent system. It interprets user requests, orchestrates interactions with tools (component 2), and drives the agent's decision-making process. Together with tools, the LLM forms the foundation of a basic agent (component 3), which we'll construct throughout these initial chapters.

Figure 2.1 The LLM serves as the reasoning engine for AI agents.

2.1 Choosing LLMs for agents

2.1.1 Using closed LLMs for agents

2.1.2 Comparing closed LLMs

2.2 LLM API basics for building agents

2.2.1 Responding to user requests: Chat completion

2.2.2 Conversation management: short-term memory

2.2.3 Structured output: A bridge for system integration

2.2.4 Asynchronous API calls: Handling multiple requests at once

2.3 Enhancing agent intelligence: Prompt engineering

2.3.1 Prompt engineering for agents

2.4 Solving problems with LLMs

2.4.1 LLM Performance from Experiments: The Power and Limits of Prompting—and Why Tool Use Matters

2.5 Summary