5 Reasoning: How your agent decides what to do next
This chapter covers
- Diagnosing reasoning depth in production agent systems
- Building Chain-of-Thought traces that make reasoning auditable and debuggable
- Implementing Complexity-Based Routing to match thinking depth to problem difficulty
- Exploring multiple solution paths with Parallel Exploration using tree search
- Grounding decisions in empirical evidence through Iterative Hypothesis Testing
- Upgrading Argus with reasoning traces and depth-calibrated review
"Solving a problem simply means representing it so as to make the solution transparent."
— Herbert Simon, The Sciences of the Artificial (1969)
We shipped Argus with full extended thinking enabled, 128K thinking tokens for every pull request. The reviews were brilliant. A three-line whitespace fix received a 4,000-word analysis covering edge cases the code would never encounter. A typo in a README triggered a meditation on documentation philosophy that cited four academic papers. The senior engineers loved the thoroughness until the first monthly invoice arrived.
At 100 reviews per day and $0.19 per review, we were burning $570 a day on reasoning. I pulled the logs and counted: 73% of our PRs were simple: formatting, dependency bumps, one-line fixes. They consumed the same 128K thinking budget as the complex architectural changes. We were paying Michelin-star prices for every meal, including ones that should have been a sandwich.