chapter nine

9 Model serving in practice

This chapter covers

Building a sample predictor with the model service approach
Building a sample service with TorchServe and the model server approach
Touring popular open source model serving libraries and systems
Explaining production model release process
Discussing post production model monitoring

In the previous chapter, we discussed the concept of model serving, as well as user scenarios and design patterns. In this chapter, we will focus on the actual implementation for these concepts in production.

As we’ve said, one of the challenges to implementing model serving nowadays is that we have too many possible ways of doing it. In addition to multiple blackbox solutions there are also many options for customizing and building all or part of it from scratch. We think the best way to teach you the intuition of choosing the right approach is with concrete examples.

9.1 Model service sample

9.1.1 Play with service

9.1.2 Service design

9.1.3 The frontend service

9.1.4 Intent classification predictor

9.1.5 Model eviction

9.2 TorchServe model server sample

9.2.1 Play with service

9.2.2 Service design

9.2.3 The frontend service

9.2.4 TorchServe backend

9.2.5 TorchServe API

9.2.6 TorchServe model files

9.2.7 Scaling up in Kubernetes

9.3 Model server vs Model service

9.4 Touring open source model serving tools

9.4.1 Tensorflow serving