9 Serving models with Kubernetes and Kubeflow
This chapter covers
- Understanding different methods of deploying and serving models in the cloud.
- Serving Keras and TensorFlow models with TensorFlow-Serving
- Deploying TensorFlow-Serving to Kubernetes
- Using Kubeflow and KFServing for simplifying the deployment process
In the previous chapter, we talked about model deployment with AWS Lambda and TensorFlow-Lite.
In this chapter, we discuss the “serverful” approach to model deployment: we will serve the clothes classification model with TensorFlow-Serving on Kubernetes. Also, we’ll talk about Kubeflow — an extension for Kubernetes that makes model deployment easier.
We’re going to cover a lot of material in this chapter, but Kubernetes is so complex that it’s simply not possible to go deep into details. Because of that, we’ll often refer to external resources for a more in-depth coverage of some topics. But don’t worry: you will learn enough to feel comfortable deploying your own models with it.
9.1 Kubernetes and Kubeflow
Kubernetes is a container orchestration platform. It sounds complex, but it’s nothing else, but a place where we can deploy Docker containers. It takes care of exposing these containers as web services and scales these services up and down as the amount of requests we receive changes.
Kubernetes is not the easiest tool, but it’s very powerful. It’s very likely that you will need to use it. That’s why we decided to cover it in this book.