3 Using offline evaluations as diagnostics
This chapter covers
- Defining diagnostic offline evaluations in more detail
- Illustrating real-world scenarios where diagnostics are useful
- Handling the common pitfalls of diagnostic offline evaluations
- Engineering considerations to incorporate diagnostic offline evaluations
In Chapter 2, we learned all about some basic offline evaluation principles, including the anatomy of an offline evaluation: data, evaluation design, and metrics. We also introduced the different types of evaluations to ensure everyone knows what's possible. Next, we’ll explore diagnostic evaluations, and unpack their value and applications in the context of machine learning-powered products.
The term diagnostic might sound clinical and unnerving, especially if you’re well familiar with the medical industry, but in the context of evaluating machine learning models, it’s actually more about uncovering hidden gems of insights and understanding why a model behaves the way it does. In fact, it’s closely related to the broader notion of AI explainability: the idea that we should be able to trace, interpret, and understand model outputs, rather than treating them as black boxes.