chapter eight

8 Experimentation in Action: Finalizing an MVP with MLFlow and runtime optimization

This chapter covers

Establishing metrics recording about ML processes in a sustainable and historically referenceable way with the power of MLFlow.
Accelerating the execution of parallelizable modeling tasks through concurrency and hybrid approaches to reduce cost.

In the previous chapter, we arrived at a solution to one of the most demoralizing tasks that we face as ML practitioners (fine-tuning models). By having techniques to solve the tedious act of tuning, we can greatly reduce the risk of producing ML-backed solutions that are inaccurate to the point of being worthless. In the process of applying those techniques, however, we quietly welcomed an enormous elephant into the room of our project: tracking.

With the example that we’ve been using throughout the last several chapters (the time series model), we are required to retrain our models each time that we do inference. For the vast majority of other supervised learning tasks, this won’t be the case. Those other applications of modeling (both supervised and unsupervised) will have periodic retraining events, wherein each model is called for inference (prediction) many times between training events.

8.1 Logging: code, metrics, and results

8 Experimentation in Action: Finalizing an MVP with MLFlow and runtime optimization

This chapter covers

8.1 Logging: code, metrics, and results

8.1.1 MLFlow tracking

8.1.2 Please stop printing and log your information

8.1.3 Version control, branch strategies, and working with others

8.2 Scalability and concurrency

8.2.1 What is concurrency?

8.2.2 What you can (and can’t) run asynchronously

8.3 Summary