chapter twelve

12 Measuring and reporting results

This chapter covers

Measuring results
Benefiting from A/B tests
Reporting received results

In the preceding chapters, we discussed the building blocks that form the backbone of a machine learning (ML) system, starting with data collection and model selection, continuing with metrics, losses, and validation split, and ending with a comprehensive error analysis. With all these elements firmly established in the pipeline, it’s time to circle back to the initial purpose of the system and think about proper reporting of the achieved results.

Reporting consists of evaluating our ML system’s performance based on its final goal and sharing the results with teammates and stakeholders. In chapter 5, we introduced two types of metrics: online and offline metrics, which generate two types of evaluation: offline testing and online testing. While offline testing is relatively straightforward, online testing implies running experiments in real-world scenarios. Typically, the most effective approach involves a series of A/B tests, a crucial procedure in developing an efficient, properly working model, which we cover in section 12.2. This helps capture metrics that either directly match or are highly correlated with our business goals.

12.1 Measuring results

12.1.1 Model performance

12.1.2 Transition to business metrics

12.1.3 Simulated environment

12.1.4 Human evaluation

12.2 A/B testing

12.2.1 Experiment design

12.2.2 Splitting strategy

12.2.3 Selecting metrics

12.2.4 Statistical criteria

12.2.5 Simulated experiments

12.2.6 When A/B testing is not possible

12.3 Reporting results

12.3.1 Control and auxiliary metrics

12.3.2 Uplift monitoring

12.3.3 When to finish the experiment

12.3.4 What to report

12.3.5 Debrief document

12.4 Design document: Measuring and reporting

12.4.1 Measuring and reporting for Supermegaretail

12.4.2 Measuring and reporting for PhotoStock Inc.

Summary