12 Machine learning pipeline

 

This chapter covers

  • Understanding machine learning pipelines with experiment management and hyperparameter optimization
  • Implementing Docker containers for the DC taxi model to reduce boilerplate code
  • Deploying a machine learning pipeline to train the model

Thus far, you have learned about the individual stages or steps of machine learning in isolation. Focusing on one step of machine learning at a time helped to concentrate your effort on a more manageable scope of work. However, to deploy a production machine learning system it is necessary to integrate these steps into a single pipeline: the outputs of a step flowing into the inputs of the subsequent steps of the pipeline. Further, the pipeline should be flexible enough to enable the hyperparameter optimization (HPO) process to manage and to experiment with the specific tasks executed across the stages of the pipeline.

In this chapter, you will learn about the concepts and the tools you can use to integrate the machine learning pipeline, deploy it to AWS, and train a DC Taxi fare estimation machine learning model using experiment management and hyperparameter optimization.

12.1 Describing the machine learning pipeline

This section introduces the core concepts needed to explain the machine learning pipeline implementation described in this chapter.

12.2 Enabling PyTorch-distributed training support with Kaen

12.2.1 Understanding PyTorch-distributed training settings

12.3 Unit testing model training in a local Kaen container

12.4 Hyperparameter optimization with Optuna

12.4.1 Enabling MLFlow support

12.4.2 Using HPO for DcTaxiModel in a local Kaen provider

12.4.3 Training with the Kaen AWS provider

Summary