4 Pipelines a deeper look

 

This chapter covers

  • Using pipelines to organize and execute a full machine learning workflow in pods to get the benefits of scalability, parallelism and reproducibility that containers and Kubernetes provide.
  • Using Elyra, a high-level tool to convert data science notebooks automatically into pipeline components, to reduce the overhead of learning the low-level pipelines API.
  • Automating pipeline executions through events and triggers such as timers and data store events.

In the previous chapter, we gave an overview of the tools provided by Kubeflow for executing data science workflows in a distributed environment. In particular, we introduced Pipelines to decompose, schedule and execute each step in a workflow in a containerized environment.

In this chapter, we will dive deeper into pipelines and demonstrate the conversion of a python-based workflow to Kubeflow Pipelines using a domain-specific language (DSL) for the task of hyperparameter search and tuning. While this might seem specific, hyperparameter tuning is intended to represent a typical machine learning workflow and our hope is that, after reading this chapter, you will be able to use pipelines to execute your workflows, seamlessly convert notebooks to pipelines using Elyra and automate pipeline executions in production.

4.1 Pipelines

4.1.1 Kubeflow Pipelines: Argo Workflows

4.1.2 Kubeflow Pipelines: Tekton

4.2 Elyra: From notebook to pipelines

4.3 MLOps: Automating pipeline executions

4.3.1 Kubeflow Run Trigger

4.3.2 Data Store Triggered Runs

4.4 Summary