chapter twelve

12 Network design alternatives to RNNs

This chapter covers

Working around the limitations of RNNs
Adding time to a model using positional encodings
Adapting CNNs to sequence-based problems
Extending attention to multiheaded attention
Understanding transformers

Recurrent neural networks—in particular, LSTMs—have been used for classifying and working with sequence problems for over two decades. While they have long been reliable tools for the task, they have several undesirable properties. First, RNNs are just plain slow. They take a long time to train, which means waiting around for results. Second, they do not scale well with more layers (hard to improve model accuracy) or with more GPUs (hard to make them train faster). With skip connections and residual layers, we have learned about many ways to get fully connected and convolutional networks to train with more layers to get better results. But RNNs just do not seem to like being deep. You can add more layers and skip connections, but they do not show the same degree of benefits as improved accuracy.

12.1 TorchText: Tools for text problems

12.1.1 Installing TorchText

12.1.2 Loading datasets in TorchText

12.1.3 Defining a baseline model

12.2 Averaging embeddings over time

12.2.1 Weighted average over time with attention

12.3 Pooling over time and 1D CNNs

12.4 Positional embeddings add sequence information to any model

12.4.1 Implementing a positional encoding module

12.4.2 Defining positional encoding models

12.5 Transformers: Big models for big data

12.5.1 Multiheaded attention

12.5.2 Transformer blocks

Exercises

Summary