chapter eleven

11 Deploying and serving NLP applications

This chapter covers

Choosing the right architecture for your NLP application
Version-controlling your code, data, and model
Deploying and serving your NLP model
Interpreting and analyzing model predictions with LIT (Language Interpretability Tool )

Where chapters 1 through 10 of this book are about building NLP models, this chapter covers everything that happens outside NLP models. Why is this important? Isn’t NLP all about building high-quality ML models? It may come as a surprise if you don’t have much experience with production NLP systems, but a large portion of an NLP system has very little to do with NLP at all. As shown in figure 11.1, only a tiny fraction of a typical real-world ML system is the ML code, but the “ML code” part is supported by numerous components that provide various functionalities, including data collection, feature extraction, and serving. Let’s use a nuclear power plant as an analogy. In operating a nuclear power plant, only a tiny fraction concerns nuclear reaction. Everything else is a vast and complex infrastructure that supports safe and efficient generation and transportation of materials and electricity—how to use the generated heat to turn the turbine to make electricity, how to cool and circulate water safely, how to transmit the electricity efficiently, and so on. All those supporting infrastructures have little to do with nuclear physics.

11.1 Architecting your NLP application

11.1.1 Before machine learning

11.1.2 Choosing the right architecture

11.1.3 Project structure

11.1.4 Version control

11.2 Deploying your NLP model

11.2.1 Testing

11.2.2 Train-serve skew

11.2.3 Monitoring

11.2.4 Using GPUs

11.3 Case study: Serving and deploying NLP applications

11.3.1 Serving models with TorchServe

11.3.2 Deploying models with SageMaker

11.4 Interpreting and visualizing model predictions