chapter seven

7 Model serving in practice

This chapter covers

Building a sample predictor with the model service approach
Building a sample service with TorchServe and the model server approach
Touring popular open source model serving libraries and systems
Explaining the production model release process
Discussing postproduction model monitoring

In the previous chapter, we discussed the concept of model serving, as well as user scenarios and design patterns. In this chapter, we will focus on the actual implementation of these concepts in production.

As we’ve said, one of the challenges to implementing model serving nowadays is that we have too many possible ways of doing it. In addition to multiple black-box solutions, there are many options for customizing and building all or part of it from scratch. We think the best way to teach you how to choose the right approach is with concrete examples.

7.1 A model service sample

7.1.1 Play with the service

7.1.2 Service design

7.1.3 The frontend service

7.1.4 Intent classification predictor

7.1.5 Model eviction

7.2 TorchServe model server sample

7.2.1 Playing with the service

7.2.2 Service design

7.2.3 The frontend service

7.2.4 TorchServe backend

7.2.5 TorchServe API

7.2.6 TorchServe model files

7.2.7 Scaling up in Kubernetes

7.3 Model server vs. model service

7.4.1 TensorFlow Serving