chapter eleven

11 Building a causal inference workflow

This chapter covers

Building a causal analysis workflow
Estimating causal effects with DoWhy
Estimating causal effects using machine learning methods
Causal inference with causal latent variable models

Recall the causal inference workflow I introduced in Chapter 10, shows again in Figure 11.1

Figure 11.1 A workflow for a causal inference analysis.

In this chapter, we’ll focus on building out this workflow in full. We’ll focus on one type of query in particular – causal effects. But the workflow generalizes to all causal queries.

We focus on causal effect inference, namely estimation of average treatment effects (ATEs) and conditional average treatment effects (CATEs) because they are the most popular causal query.

Refresher: Why ATEs and CATEs dominate

Estimating ATEs and CATEs is the most popular causal effect inference task. Some of the reasons include:

(1) We can rely on causal effect inference techniques when randomized experiments are not feasible, ethical, or possible.

(2) We can use causal effect inference techniques to address practical issues with real-world experiments (e.g., post-randomization confounding, attrition, spillover, missing data, etc.).

(3) In an era where companies can run many different digital experiments in online applications and stores, causal effect inference techniques can help prioritize experiments, reducing opportunity cost.

11.1 Step 1: Select the query

11.2 Step 2: Build the Model

11.3 Step 3: Identify the estimand

11.3.1 The backdoor adjustment estimand

11.3.2 The instrumental variable estimand

11.3.3 The front-door adjustment estimand

11.3.4 Choosing estimands and reducing “DAG anxiety”

11 Building a causal inference workflow

This chapter covers

Figure 11.1 A workflow for a causal inference analysis.

Refresher: Why ATEs and CATEs dominate

11.1 Step 1: Select the query

11.2 Step 2: Build the Model

11.3 Step 3: Identify the estimand

11.3.1 The backdoor adjustment estimand

11.3.2 The instrumental variable estimand

11.3.3 The front-door adjustment estimand

11.3.4 Choosing estimands and reducing “DAG anxiety”

11.3.5 When you don’t have identification

11.4 Step 4: Estimate the estimand

11.4.1 Linear regression estimation of the backdoor estimand

11.4.2 Propensity score estimators of the backdoor estimand

11.4.3 Backdoor estimation with machine learning

11.4.4 Front-door estimation

11.4.5 Instrumental variable methods

11.4.6 Comparing and selecting estimators

11.5 Step 5. Refutation

11.5.1 Data size reduction

11.5.2 Adding a dummy confounder

11.5.3 Replacing treatment with a dummy

11.5.4 Replacing outcome with a dummy outcome

11.5.5 Testing robustness to unmodeled confounders

11.6 Causal Inference with Causal Generative Models

11.6.1 Transformations for causal inference

11.6.2 Steps for inferring a causal query with a causal generative model

11.6.3 Extending inference to estimation

11.6.4 A VAE-inspired model for causal inference

11.7 Summary