chapter four

4 Engineering system performance evaluations

This chapter covers

Introducing engineering system performance evaluations
Detailing why load time and latency matters
Illustrating shadow traffic methodology
Understanding the limitations of offline engineering performance evaluations
Exploring example engineering performance metrics

What lies behind every AI model? Data? Definitely. Offline evaluations? Of course. A/B test? I sure hope so. Complex, high performing, highly scalable engineering system to fetch, serve and in general orchestrate using the AI model output on a product? Absolutely.

Now, imagine waiting a full minute for Netflix, a product powered by AI, to load just to see your recommendations (unless that’s a sign from the universe to go for a walk or pick up a book). If your AI-powered product is slow, users will bounce, no matter how good your model is.

In the previous chapter, we explored offline diagnostic evaluations, which help uncover hidden issues, biases, or weaknesses that model performance metrics alone might overlook. Engineering performance metrics, on the other hand, serve a different purpose—they reveal scaling challenges, latency degradation, and potential vulnerabilities from a system design perspective, crucial for successfully deploying a model in production.

4.1 Why latency and load time really matters

4.1.1 Illustrating the impact on user experience and business metrics

4.1.2 Technical constraints and scaling challenges

4.2 Engineering system performance metrics

4.2.1 Key load time metrics

4.2.2 Key latency metrics

4.2.3 Combining load time and latency metrics for performance evaluations

4.3 Offline simulation through shadow traffic

4.3.1 How shadow traffic works

4.3.2 Shadow traffic versus A/B testing

4.3.3 Benefits and challenges of shadow traffic

4.3.4 Mimicking shadow traffic

4.4 Latency degradation experiments

4.4.1 Designing latency degradation tests

4.4.2 Key online evaluation system performance metrics

4.4.3 Movie recommendations example

4.5 Engineering considerations

4.5.1 Make sure to capacity plan your infrastructure resources

4.5.2 You’ll never regret time spent testing under realistic product conditions

4.5.3 Keep in mind these performance optimization tactics

4.6 Summary