15 Serving and inference optimization
This chapter covers
- Challenges that may arise during the serving and inference stage
- Tools and frameworks that will come in handy
- Optimizing inference pipelines
Making your machine learning (ML) model run in a production environment is among the final steps required for reaching an efficient operating lifecycle of your system. Some ML practitioners demonstrate low interest in this aspect of the craft, preferring instead to focus on developing and training their models. This might be a false move, however, as the model can only be useful if it’s deployed and effectively utilized in production. In this chapter, we discuss the challenges of deploying and serving ML models, as well as review different methods of optimizing the inference process.