12 Measuring and reporting results
In the preceding chapters, we discussed the building blocks that form the backbone of a machine learning (ML) system, starting with data collection and model selection, continuing with metrics, losses, and validation split, and ending with a comprehensive error analysis. With all these elements firmly established in the pipeline, it’s time to circle back to the initial purpose of the system and think about proper reporting of the achieved results.
Reporting consists of evaluating our ML system’s performance based on its final goal and sharing the results with teammates and stakeholders. In chapter 5, we introduced two types of metrics: online and offline metrics, which generate two types of evaluation: offline testing and online testing. While offline testing is relatively straightforward, online testing implies running experiments in real-world scenarios. Typically, the most effective approach involves a series of A/B tests, a crucial procedure in developing an efficient, properly working model, which we cover in section 12.2. This helps capture metrics that either directly match or are highly correlated with our business goals.