12 Causal decisions and reinforcement learning
This chapter covers
- Using causal models to automate decisions
- Setting up causal bandit algorithms
- How to incorporate causality into reinforcement learning
When we apply methods from statistics and machine learning, it is typically in service of making a decision or automating decision-making. Algorithms for automated decision-making, such as bandit and reinforcement learning (RL) algorithms, involve agents that learn how to make good decisions. In both cases, decision-making is fundamentally a causal problem: a decision to take some course of action leads to consequences, and the objective is to choose the action that leads to consequences favorable to the decision-maker. That motivates a causal framing.
Often, the path from action to consequences has a degree of randomness. For example, your choice of how to play a hand of poker may be optimal, but you still might lose due to chance. That motivates a probabilistic modeling approach.