chapter fifteen

15 Deploying to production

This chapter covers:

Options for deploying PyTorch models
Working with the PyTorch JIT
Deploying a model server and exporting models
Running exported and and natively implemented models from C++
Running models on mobile

In part 1 of this book, we learned a lot about models; and part 2 left us with a detailed path for creating good models for a particular problem. Now that we have these great models, we need to take them where they can be useful. Maintaining infrastructure for executing inference of deep learning models at scale can be impactful from an architectural as well as cost standpoint. While PyTorch started off as a framework focused on research, beginning with the 1.0 release, a set of production-oriented features were added that today make PyTorch an ideal end-to-end platform from research to large-scale production.

What deploying to production means will vary with the use case:

15.1 Serving PyTorch models

15.1.1 Our model behind a Flask server

15.1.2 What we want from deployment

15.1.3 Request batching

15.2 Exporting models

15.2.1 Interoperability beyond PyTorch with ONNX

15.2.2 PyTorch’s own export: Tracing

15.2.3 Our server with a traced model

15.3 Interacting with the PyTorch JIT

15.3.1 What to expect from moving beyond classic Python/PyTorch

15.3.2 The dual nature of PyTorch as interface and backend

15.3.3 TorchScript

15.3.4 Scripting the gaps of traceability

15.4 LibTorch: PyTorch in C++

15.4.1 Running JITed models from C++

15.4.2 C++ from the start: The C++ API

15.5 Going mobile

15.5.1 Improving efficiency: Model design and quantization

15.6 Emerging technology: Enterprise serving of PyTorch models

15.7 Conclusion