chapter nine

9 Serving models with Kubernetes and Kubeflow

 

This chapter covers

  • Understanding different methods of deploying and serving models in the cloud.
  • Serving Keras and TensorFlow models with TensorFlow-Serving
  • Deploying TensorFlow-Serving to Kubernetes
  • Using Kubeflow and KFServing for simplifying the deployment process

In the previous chapter, we talked about model deployment with AWS Lambda and TensorFlow-Lite.

In this chapter, we discuss the “serverful” approach to model deployment: we will serve the clothes classification model with TensorFlow-Serving on Kubernetes. Also, we’ll talk about Kubeflow — an extension for Kubernetes that makes model deployment easier.

We’re going to cover a lot of material in this chapter, but Kubernetes is so complex that it’s simply not possible to go deep into details. Because of that, we’ll often refer to external resources for a more in-depth coverage of some topics. But don’t worry: you will learn enough to feel comfortable deploying your own models with it.

9.1       Kubernetes and Kubeflow

Kubernetes is a container orchestration platform. It sounds complex, but it’s nothing else, but a place where we can deploy Docker containers. It takes care of exposing these containers as web services and scales these services up and down as the amount of requests we receive changes.

Kubernetes is not the easiest tool, but it’s very powerful. It’s very likely that you will need to use it. That’s why we decided to cover it in this book.

9.2       Serving models with TensorFlow-Serving

9.2.1   Overview of the serving architecture

9.2.2   The saved_model format

9.2.3   Running TensorFlow-Serving locally

9.2.4   Invoking the TF-Serving model from Jupyter

9.2.5   Creating the Gateway service

9.3       Model deployment with Kubernetes

9.3.1   Introduction to Kubernetes

9.3.2   Creating a Kubernetes cluster on AWS

9.3.3   Preparing the Docker images

9.3.4   Deploying to Kubernetes

9.3.5   Testing it

9.4       Model deployment with Kubeflow

9.4.1   Preparing the model: uploading it to S3

9.4.2   Deploying TensorFlow models with KFServing

9.4.3   Accessing the model

9.4.4   KFServing Transformers

9.4.5   Testing the transformer

9.4.6   Deleting the EKS cluster

9.5       Next steps

9.5.1   Exercises

9.5.2   Other projects

9.6       Summary