6 The logic of multi-stage decision processes: Richard Bellman and the principle of recursive optimization
This chapter covers
- Richard Bellman’s Dynamic Programming (1957), which introduced and defined the principle of optimality and sequential decision-making
- Bellman’s dynamic programming paradigm: recursion, state representation, transitions, base cases, memoization, and tabulation
- Foundational examples that demonstrate the breadth and power of dynamic programming
- Real-world applications across machine learning, artificial intelligence, and beyond
- The Bellman equation, Markov decision processes, and the enduring challenges of scale—the “curse of dimensionality”
We have so far explored several frameworks for reasoning under uncertainty—methods for estimating probabilities, testing hypotheses, and measuring information. But the most difficult decisions are rarely resolved in a single step. More often, they unfold sequentially, with each choice reshaping what remains possible. What was missing from Bayes, Fisher, and related frameworks was a systematic way to reason about such multi-stage decisions, where today’s action constrains tomorrow’s options.
6.1 The dynamic programming paradigm
6.1.1 Structural prerequisites for dynamic programming
6.1.2 Recursive formulations: expressing problems as recurrences
6.1.3 States, transitions, and base cases: the anatomy of a dynamic programming solution
6.1.4 Implementation methods: memoization and tabulation
6.1.5 Trade-offs and best practices
6.1.6 From principles to practice
6.2 Recurrence basics: the Fibonacci sequence
6.2.1 Fibonacci as a dynamic programming problem
6.3 Graph decomposition: the shortest-route problem
6.4 Counting solutions: the coin change problem
6.5 Constrained optimization: the knapsack problem
6.6 Efficacy and enduring value of dynamic programming
6.7 Enduring value: dynamic programming in real-world systems
6.7.1 Constrained optimization
6.7.2 Inventory control
6.7.3 Shortest routes and networks
6.7.4 Bioinformatics and string matching
6.7.5 Machine learning and AI
6.8 Markov Decision Processes and the Bellman equation
6.8.1 Markov Decision Processes explained
6.8.2 The Bellman equation: recursive decomposition of value
6.8.3 From dynamic programming to reinforcement learning
6.9 Pitfalls and challenges of dynamic programming
6.9.1 State explosion and the curse of dimensionality
6.9.2 Memory versus computation trade-offs
6.9.3 Defining states and the Markov property
6.10 Synthesis and future direction