17 Deploying to production
This chapter covers
- Options for deploying PyTorch models
- Deploying models with web frameworks and APIs
- Optimizing inference performance
- Exporting models for various deployment targets
- Running exported and natively implemented models from C++
In part 1 of this book, we learned a lot about models; and part 2 left us with a detailed path for creating good models for a particular problem. Now that we have these great models, we need to take them where they can be useful. Maintaining infrastructure for executing inference of deep learning models at scale can be impactful from an architectural as well as cost standpoint. While PyTorch started as a research-focused framework, it has undergone significant evolution, incorporating production-oriented features that make it an ideal end-to-end platform for both research and large-scale production.
What deploying to production means will vary with the use case: