14 Text classification

 

This chapter covers

  • An introduction to the field of Natural Language Processing (NLP)
  • Preprocessing text input into numeric input
  • Building simple text classification models

This chapter will lay the foundation for working with text input that we will build on in the next two chapters of this book. By the end of this chapter, you will be able to build a simple text classifier in a number of different ways. This will set the stage for building more complicated models, like the Transformer, in the next chapter.

14.1 A brief history of Natural Language Processing

14.2 Preparing text data

14.2.1 Character and word tokenization

14.2.2 Subword tokenization

14.3 Sets vs. Sequences

14.3.1 Loading the IMDb Classification Dataset

14.4 Set models

14.4.1 Training a bag-of-words model

14.5 Training a bigram model

14.6 Sequence models

14.6.1 Training a recurrent model

14.6.2 Understanding word embeddings

14.6.3 Using a word embedding

14.7 Pretraining a word embedding

14.8 Using the pretrained embedding for classification

14.9 Chapter summary