6 Evaluating models in an A/B test
This chapter covers
- Defining the fundamentals of an A/B test
- Illustrating quirks and characteristics of evaluating AI models in A/B Test setting
- Interpreting low-signal results from online evaluations
- Defining the right time to A/B test
What hasn’t already been said about A/B testing? It’s the conduit for innovation, for insights, for really understanding the effect of a change on a product. A/B testing is one of the most critical steps in not just the model development lifecycle but any feature that's built for a user facing product.
In Part 1 of this book, we detailed model offline evaluations, including diagnostic, performance and counterfactual evaluations, but that's just a portion of model evaluation strategy. It’s very important to measure the effect of a model in an online setting, such an A/B test. A/B testing, or online controlled experiments, is a very common practice at this point.
Now when you bring AI into the picture, this online experimentation methodology comes with its own quirks and characteristics. Subtle feedback loops, shifting user behavior, and high-variance metrics start to blur the clean lines of statistical testing.
This chapter unpacks the quirks, pitfalls, and practical aspects that come from evaluating AI models in a real world, online setting.