5 Counterfactual evaluations
This chapter covers
- Introducing causal inference
- Illustrating the anatomy of a counterfactual evaluation
- Understanding the strengths and limitations of counterfactual evaluations
- Logging best practices for counterfactual evaluations
What if you could play out different scenarios without them actually happening in real life? What if you had taken that new job instead of staying at your current one? Or ordered the other dish on the menu? This kind of 'what if' thinking is essentially what counterfactual evaluations allow us to do.
With the right data, counterfactual evaluations can help you understand what could have happened if an AI model had made a different decision. The last chapter detailed engineering system performance metrics that every AI practitioner should consider before introducing a model into a production setting. We discussed things like dark-loading, latency degradation metrics and key metrics to consider when measuring systems performance. Now in this chapter, we’re shifting focus from the systems that surround a model and back to the model itself by exploding another powerful methodology: counterfactual evaluations.
5.1 What is causal inference
Let’s first talk about causal inference before we explore counterfactual evaluations. Causal inference is all about answering a simple yet profound question: what caused what?