- An introduction to the field of natural language processing (NLP)
- Preprocessing text input into numeric input
- Building simple text classification models
This chapter will lay the foundation for working with text input that we will build on in the next two chapters of this book. By the end of this chapter, you will be able to build a simple text classifier in a number of different ways. This will set the stage for building more complicated models, like the Transformer, in the next chapter.
14.1 A brief history of natural language processing
14.2 Preparing text data
14.2.1 Character and word tokenization
14.2.2 Subword tokenization
14.3 Sets vs. sequences
14.3.1 Loading the IMDb classification dataset
14.4 Set models
14.4.1 Training a bag-of-words model
14.4.2 Training a bigram model
14.5 Sequence models
14.5.1 Training a recurrent model
14.5.2 Understanding word embeddings
14.5.3 Using a word embedding
14.5.4 Pretraining a word embedding
14.5.5 Using the pretrained embedding for classification
Summary