Part 2: AI Model Online Evaluations
It’s time to move past offline evaluations of a model and into the online setting, most notably A/B testing. This phase in model development introduces changes to the product that expose the model to the user, in one way or another.
Online evaluations are where your metrics meet reality. They validate whether offline improvements translate into real-world impact and reveal metric trade-offs that couldn't be uncovered in an offline setting. In Chapter 6, we’ll focus on how to evaluate models in an A/B test by covering experiment design and how to interpret ambiguous or low-signal results. Chapter 7 bridges the gap between offline and live testing, walking through model rollout strategies, internal beta testing, and how to connect offline metrics to online monitoring. Finally, Chapter 8 explores the pitfalls of online metrics.