8 Serverless deep learning


This chapter covers

  • Serving models with TensorFlow-Lite — a light-weight environment for applying TensorFlow models
  • Deploying deep learning models with AWS Lambda
  • Exposing the Lambda function as a web service via API Gateway

In the previous chapter, we have trained a deep learning model for categorizing images of clothes. Now we need to deploy it: make the model available for other services.

There are many possible ways to do it. We have already covered the basics of model deployment in Chapter 5, where we talked about using Flask, Docker and AWS Beanstalk for deploying a logistic regression model.

In this chapter, we’ll talk about the serverless approach for deploying models — we’ll use AWS Lambda.

8.1       Serverless: AWS Lambda

AWS Lambda is a service from Amazon. Its main promise is “run code without thinking about servers”.

It lives up to the promise: in AWS Lambda, we just need to upload some code. The service will take care of running it, and it will scale it up and down according to the load.

Additionally, you need to pay only for the time when the function is actually used. When nobody uses the model and invokes our service, you don’t pay for anything.

In this chapter, we’ll use AWS Lambda for deploying the model we trained previously. For doing that, we’ll also use TensorFlow-Lite — a light-weight version of TensorFlow that has only the most essential functions.

8.1.1   TensorFlow-Lite

8.1.2   Converting the model to TF-Lite format

8.1.3   Preparing the images

8.1.4   Using the TensorFlow-Lite model

8.1.5   Code for the Lambda function

8.1.6   Preparing the Docker image

8.1.7   Pushing the image to AWS ECR

8.1.8   Creating the Lambda function

8.1.9   Creating the API Gateway

8.2       Next steps

8.2.1   Exercises

8.2.2   Other projects

8.3       Summary