10 Best practices in developing NLP applications

 

This chapter covers

  • Making neural network inference more efficient by sorting, padding, and masking tokens
  • Applying character-based and BPE tokenizationfor splitting text into tokens
  • Avoiding overfitting via regularization
  • Dealing with imbalanced datasets by using upsampling, downsampling, and loss weighting
  • Optimizing hyperparameters

We’ve covered a lot of ground so far, including deep neural network models such as RNNs, CNNs, and the Transformer, and modern NLP frameworks such as AllenNLP and Hugging Face Transformers. However, we’ve paid little attention to the details of training and inference. For example, how do you train and make predictions efficiently? How do you avoid having your model overfit? How do you optimize hyperparameters? These factors could make a huge impact on the final performance and generalizability of your model. This chapter covers these important topics that you need to consider to build robust and accurate NLP applications that perform well in the real world.

10.1 Batching instances

In chapter 2, we briefly mentioned batching, a machine learning technique where instances are grouped together to form batches and sent to the processor (CPU or, more often, GPU). Batching is almost always necessary when training large neural networks—it is critical for efficient and stable training. In this section, we’ll dive into some more techniques and considerations related to batching.

10.1.1 Padding

10.1.2 Sorting

10.1.3 Masking

10.2 Tokenization for neural models

10.2.1 Unknown words

10.2.2 Character models

10.2.3 Subword models

10.3 Avoiding overfitting

10.3.1 Regularization

10.3.2 Early stopping

10.3.3 Cross-validation