chapter five

5 Reasoning: How your agent decides what to do next

 

This chapter covers

  • Diagnosing reasoning depth in production agent systems
  • Building Chain-of-Thought traces that make reasoning auditable and debuggable
  • Implementing Complexity-Based Routing to match thinking depth to problem difficulty
  • Exploring multiple solution paths with Parallel Exploration using tree search
  • Grounding decisions in empirical evidence through Iterative Hypothesis Testing
  • Upgrading Argus with reasoning traces and depth-calibrated review
"Solving a problem simply means representing it so as to make the solution transparent."

— Herbert Simon, The Sciences of the Artificial (1969)

We shipped Argus with full extended thinking enabled, 128K thinking tokens for every pull request. The reviews were brilliant. A three-line whitespace fix received a 4,000-word analysis covering edge cases the code would never encounter. A typo in a README triggered a meditation on documentation philosophy that cited four academic papers. The senior engineers loved the thoroughness until the first monthly invoice arrived.

At 100 reviews per day and $0.19 per review, we were burning $570 a day on reasoning. I pulled the logs and counted: 73% of our PRs were simple: formatting, dependency bumps, one-line fixes. They consumed the same 128K thinking budget as the complex architectural changes. We were paying Michelin-star prices for every meal, including ones that should have been a sandwich.

5.1 What is reasoning? How agents think, and how they decide how deeply to think

5.1.1 Two systems of thought

5.1.2 The reasoning landscape has shifted

5.1.3 Why architectural patterns still matter

5.1.4 The reasoning patterns at a glance

5.1.5 Testing and observing reasoning

5.2 Pattern: Chain-of-Thought

5.2.1 Thinking out loud: The step-by-step chain

5.2.2 In Production: Claude Code's implicit chain-of-thought

5.2.3 Building it

5.2.4 Argus integration

5.2.5 When it breaks

5.3 Pattern: Complexity-Based Routing

5.3.1 Three tiers of reasoning depth

5.3.2 In production: The Planner-Worker economic split

5.3.3 Building it

5.3.4 Argus integration

5.3.5 When it breaks

5.4 Pattern: Parallel Exploration

5.4.1 Branching, scoring, and pruning

5.4.2 In Production: Claude Code's implicit parallel exploration

5.4.3 Building it

5.4.4 When it breaks

5.5 Pattern: Iterative Hypothesis Testing

5.5.1 The hypothesis-experiment loop