4 Engineering system performance evaluations
This chapter covers
- Introducing engineering system performance evaluations
- Detailing why load time and latency matters
- Illustrating shadow traffic methodology
- Understanding the limitations of offline engineering performance evaluations
- Exploring example engineering performance metrics
What lies behind every AI model? Data? Definitely. Offline evaluations? Of course. A/B test? I sure hope so. Complex, high performing, highly scalable engineering system to fetch, serve and in general orchestrate using the AI model output on a product? Absolutely.
Now, imagine waiting a full minute for Netflix, a product powered by AI, to load just to see your recommendations (unless that’s a sign from the universe to go for a walk or pick up a book). If your AI-powered product is slow, users will bounce, no matter how good your model is.
In the previous chapter, we explored offline diagnostic evaluations, which help uncover hidden issues, biases, or weaknesses that model performance metrics alone might overlook. Engineering performance metrics, on the other hand, serve a different purpose—they reveal scaling challenges, latency degradation, and potential vulnerabilities from a system design perspective, crucial for successfully deploying a model in production.