chapter five

5 Orchestrating ML Pipelines

This chapter covers

Building batch pipeline for model inference using Kubeflow pipelines

In the last chapter, we discussed how MLFlow and Feast can be used to make machine learning experiments more reliable. In this chapter, we extend that by orchestrating ML pipelines in Kubeflow. Kubeflow pipelines are a powerful tool that can help you build, deploy, and manage machine learning pipelines.

We start by diving into why orchestration is important and the capabilities of kubeflow pipelines. We then move into building reusable components that can then be combined together to create a pipeline. Through the chapter we will build components to handle an example income classification task. We will then put all the components together and form a cohesive pipeline that classifies income data.

5.1 Kubeflow Pipeline, The Task Orchestrator

Most machine learning inference pipelines have a common structure, we need to retrieve data from somewhere (object store, data warehouse, file system), pre-process that data, retrieve or load a model, and then perform inference. The inferences are then written to a database or uploaded to some cloud storage. A pipeline needs to run periodically, and we may need to provide some runtime parameters to it like date/time, feature table name, etc. All of this is possible using Kubeflow pipelines.

5 Orchestrating ML Pipelines

This chapter covers

5.1 Kubeflow Pipeline, The Task Orchestrator

5.1.1 Kubeflow Components

5.1.2 Income Classifier Pipeline

5.2 Summary