chapter six

6 The logic of multi-stage decision processes: Richard Bellman and the principle of recursive optimization

This chapter covers

Richard Bellman’s Dynamic Programming (1957), which introduced and defined the principle of optimality and sequential decision-making
How dynamic programming solves intractable problems by decomposing them into smaller, tractable problems
Why a small set of foundational examples reveals the breadth and power of dynamic programming
How dynamic programming applies to real-world applications in machine learning, artificial intelligence, and large-scale optimization
How the Bellman equation and Markov decision processes reveal the enduring challenges of scale—the “curse of dimensionality”

Several frameworks for reasoning under uncertainty have been explored thus far—methods for estimating probabilities, testing hypotheses, and measuring information. Yet many problems are too large or complex to solve in a single step. They must be decomposed into smaller parts, often unfolding over time, with each decision shaping what comes next. What was missing was a systematic way to reason about such multi-stage decisions.

6.1 The dynamic programming paradigm

6.1.1 Structural prerequisites for dynamic programming

6.1.2 Recursive formulations: expressing problems as recurrences

6.1.3 States, transitions, and base cases: the anatomy of a dynamic programming solution

6.1.4 Implementation methods: tabulation and memoization

6.1.5 Trade-offs and best practices

6.1.6 From principles to practice

6.2 Recurrence basics: the Fibonacci sequence

6.2.1 Fibonacci as a dynamic programming problem

6.3 Graph decomposition: the shortest-route problem

6.4 Counting solutions: the coin change problem

6.5 Constrained optimization: the knapsack problem

6.6 Efficacy and enduring value of dynamic programming

6.7 Enduring value: dynamic programming in real-world systems

6.7.1 Constrained optimization

6.7.2 Inventory control

6.7.3 Shortest routes and networks

6.7.4 Bioinformatics and string matching

6.7.5 Machine learning and AI

6.8 Markov Decision Processes and the Bellman equation

6.8.1 Markov Decision Processes explained

6.8.2 The Bellman equation: recursive decomposition of value

6.8.3 From dynamic programming to reinforcement learning

6.9 Pitfalls and challenges of dynamic programming

6.9.1 State explosion and the curse of dimensionality

6.9.2 Memory versus computation trade-offs

6.9.3 Defining states and the Markov property

6.10 Synthesis and future direction