9 Serving models with Kubernetes and Kubeflow

 

This chapter covers

  • Understanding different methods of deploying and serving models in the cloud
  • Serving Keras and TensorFlow models with TensorFlowServing
  • Deploying TensorFlow Serving to Kubernetes
  • Using Kubeflow and KFServing for simplifying the deployment process

In the previous chapter, we talked about model deployment with AWS Lambda and TensorFlow Lite.

In this chapter, we discuss the “serverful” approach to model deployment: we serve the clothing classification model with TensorFlow Serving on Kubernetes. Also, we talk about Kubeflow, an extension for Kubernetes that makes model deployment easier.

We’re going to cover a lot of material in this chapter, but Kubernetes is so complex that it’s simply not possible to go deep into detail. Because of that, we often refer to external resources for a more in-depth coverage of some topics. But don’t worry; you will learn enough to feel comfortable deploying your own models with it.

9.1 Kubernetes and Kubeflow

Kubernetes is a container orchestration platform. It sounds complex, but it’s nothing other than a place where we can deploy Docker containers. It takes care of exposing these containers as web services and scales these services up and down as the amount of requests we receive changes.

Kubernetes is not the easiest tool to learn, but it’s very powerful. It’s likely that you will need to use it at some point. That’s why we decided to cover it in this book.

9.2 Serving models with TensorFlow Serving

9.2.1 Overview of the serving architecture

9.2.2 The saved_model format

9.2.3 Running TensorFlow Serving locally

9.2.4 Invoking the TF Serving model from Jupyter

9.2.5 Creating the Gateway service

9.3 Model deployment with Kubernetes

9.3.1 Introduction to Kubernetes

9.3.2 Creating a Kubernetes cluster on AWS

9.3.3 Preparing the Docker images

9.3.4 Deploying to Kubernetes

9.3.5 Testing the service

9.4 Model deployment with Kubeflow

Summary