3 Kubeflow, an end-to-end AI/ML platform
This chapter covers
- Understanding how the tools provided by Kubeflow map to a data science workflow
- Using Jupyter notebooks to load data and train a model for an example use-case of credit card fraud detection
- Using Kubeflow’s distributed training modules to parallelize the training of a neural network model over several pods
- Using Katib to search for the best model hyperparameters
- Deploying the model to create a REST endpoint that can be queried for predictions
- Using Pipelines to organize individual steps in the workflow and possibly implement them in parallel
- Understanding Open Data Hub and additional tools provided by it
In the previous chapter, we installed Kubeflow, used a Jupyter notebook to generate synthetic data and train a model, stored the data and the trained model in persistent S3 storage and lastly, deployed the model to create a REST endpoint that could be queried for new predictions. This workflow was entirely sequential or serial and didn’t exercise the distributed nature of a Kubernetes cluster and the ability to scale workloads out to multiple pods. In this chapter, we will use this scaling ability to parallelize and distribute various tasks like model training, search for the most optimal model (hyperparameter tuning) and also execute various parts of our workflow in parallel.