8 Pitfalls of online metrics
This chapter covers
- Detailing pitfalls of online metrics unique to AI models
- Illustrating ideal metric frameworks that align with model and product goals
- Exploring technical refinements to strengthen evaluations
There’s probably a strategy to avoid getting tricked by your own metrics…right? Well yes, of course but it will require you to be intentional and thoughtful about it. If you’re not, it’s very easy for your A/B test online evaluations to spin their wheels without really giving you insights you can trust.
In Chapter 7 we discussed how to bridge offline signals into A/B test evaluations and set up your model for online testing in a way that reflects product strategy and stakeholder alignment. This chapter picks up from there. It may feel like a small manifesto on how not to be fooled by online metrics, but it’s not just that! We’ll also explore principles for a healthy metric framework and some technical refinements to get more out of the precious time your model is being evaluated in an online A/B test.
Hopefully by the end, you’ll see this as the perfect capstone for the AI model online evaluation focus in Part 2, underscoring that online metrics aren’t enough on their own. Balanced evaluations (offline, online, human, and LLM-as-a-judge) are what make AI development truly trustworthy which is what this book is all about!
8.1 Choosing metrics that actually matter
Almost every AI model has two key characteristics: