Part 2 Early stage
In this part, we dive deeper into the technical details of the early-stage work. Chapter 5 covers the benefits of selecting proper metrics and losses for your ML system, defining and utilizing proxy metrics, and applying the hierarchy of metrics. Chapter 6 is dedicated to datasets, from choosing optimal data sources and processing raw data to defining properties of a healthy data pipeline and deciding how much data is enough for the best performance of the ML model. Chapter 7 reviews standard and nontrivial validation schemas, describes the split updating procedure, and overviews validation schemas as part of the design document. In chapter 8, you will learn more about various types of baselines, starting from constant baselines as the earliest, simplest yet highly efficient version of a model, to model baselines, feature baselines, and deep learning baselines.