part one

Part 1: AI Model Offline Evaluations

In their simplest form, offline evaluations are metrics computed as changes are made to a machine learning model to measure effectiveness without introducing the change to users. In its most complex form, offline evaluations are carefully designed simulations that approximate real-world impact, requiring thoughtful dataset construction, robust metric selection, and deep understanding of user and model behavior to ensure the results are predictive of what will actually happen in production. That's what Part 1 of the book is all about; getting into the weeds of all types of offline evaluations.

In the opening chapter of this book, we’ll lay out how offline evaluations relate to the AI model development life cycle. Chapter 2 focuses on the anatomy of evaluation which we’ll be relevant as we define each type of offline evaluation in the subsequent chapters. Chapter 3 illustrates the diagnostic offline evaluation approach that’s used to understand the behavior of a model. Chapter 4 is all about engineering system evaluations before the model is introduced into production, user-facing setting. Finally, in Chapter 5, we’ll turn towards the counterfactual offline evaluation tactic.