chapter three

3 Kubeflow, an end-to-end AI/ML platform

This chapter covers

Understanding how the tools provided by Kubeflow map to a data science workflow
Using Jupyter notebooks to load data and train a model for an example use-case of credit card fraud detection
Using Kubeflow’s distributed training modules to parallelize the training of a neural network model over several pods
Using Katib to search for the best model hyperparameters
Deploying the model to create a REST endpoint that can be queried for predictions
Using Pipelines to organize individual steps in the workflow and possibly implement them in parallel
Understanding Open Data Hub and additional tools provided by it

In the previous chapter, we installed Kubeflow, used a Jupyter notebook to generate synthetic data and train a model, stored the data and the trained model in persistent S3 storage and lastly, deployed the model to create a REST endpoint that could be queried for new predictions. This workflow was entirely sequential or serial and didn’t exercise the distributed nature of a Kubernetes cluster and the ability to scale workloads out to multiple pods. In this chapter, we will use this scaling ability to parallelize and distribute various tasks like model training, search for the most optimal model (hyperparameter tuning) and also execute various parts of our workflow in parallel.

3 Kubeflow, an end-to-end AI/ML platform

This chapter covers

3.1 A High-Level View of Kubeflow Components

3.2 Model Development

3.3 Distributed Training

3.4 Hyperparameter Tuning

3.5 Model Serving

3.6 Pipelines: Putting it all together

3.7 Open Data Hub

3.8 Summary