12 Causal decisions and reinforcement learning

 

This chapter covers

  • Using causal models to automate decisions
  • Setting up causal bandit algorithms
  • How to incorporate causality into reinforcement learning

When we apply methods from statistics and machine learning, it is typically in service of making a decision or automating decision-making. Algorithms for automated decision-making, such as bandit and reinforcement learning (RL) algorithms, involve agents that learn how to make good decisions. In both cases, decision-making is fundamentally a causal problem: a decision to take some course of action leads to consequences, and the objective is to choose the action that leads to consequences favorable to the decision-maker. That motivates a causal framing.

Often, the path from action to consequences has a degree of randomness. For example, your choice of how to play a hand of poker may be optimal, but you still might lose due to chance. That motivates a probabilistic modeling approach.

12.1 A causal primer on decision theory

12.1.1 Utility, reward, loss, and cost

12.1.2 Uncertainty comes from other causes

12.2 Causal decision theory

12.2.1 Decisions as a level 2 query

12.2.2 Causal characterization of decision rules and policies

12.2.3 Causal probabilistic decision-modeling and admissibility

12.2.4 The deceptive alignment of argmax values of causal and non-causal expectations

12.2.5 Newcomb’s paradox

12.2.6 Newcomb’s paradox with a causal model

12.2.7 Introspection in causal decision theory

12.3 Causal DAGs and sequential decisions

12.3.1 Bandit feedback

12.3.2 Contextual bandit feedback

12.3.3 Delayed feedback

12.3.4 Causal queries on a sequential model