chapter twelve

12 Causal decisions and reinforcement learning

This chapter covers

Using causal models to automate decisions
Setting up causal bandit algorithms
How to incorporate causality into reinforcement learning

When we apply methods from statistics and machine learning, it is typically in service of making a decision or automating decision-making. Algorithms for automated decision-making, such as bandit and reinforcement learning (RL) algorithms, involve agents that learn how to make good decisions. In both cases, decision-making is fundamentally a causal problem: a decision to take some course of action leads to consequences, and the objective is to choose the action that leads to consequences favorable to the decision-maker. That motivates a causal framing.

Often, the path from action to consequences has a degree of randomness. For example, your choice of how to play a hand of poker may be optimal, but you still might lose due to chance. That motivates a probabilistic modeling approach.

12.1 A causal primer on decision theory

12.1.1 Utility, reward, loss, and cost

12.1.2 Uncertainty comes from other causes

12.2 Causal decision theory

12.2.1 Decisions as a level 2 query

12.2.2 Causal characterization of decision rules and policies

12.2.3 Causal probabilistic decision-modeling and admissibility

12.2.4 The deceptive alignment of argmax values of causal and non-causal expectations

12.2.5 Newcomb’s paradox

12.2.6 Newcomb’s paradox with a causal model

12.2.7 Introspection in causal decision theory

12.3 Causal DAGs and sequential decisions

12.3.1 Bandit feedback

12.3.2 Contextual bandit feedback

12.3.3 Delayed feedback

12.3.4 Causal queries on a sequential model