12 Network Design Alternatives to RNNs
This chapter covers
- Working around the limitations of RNNs.
- Addding time to a model using positional encoddings.
- Adapting CNNs to sequence-based problems.
- Extending attention to multi-headed attention.
- Understanding Transformers, a new family of network architecture.
Recurrent Neural Networks, in particular LSTMs, have been used for classifying and working with sequence problems for over two decades. While they have long been reliable tools for the task, they have a number of undesirable properties. First, RNNs are just plain slow. They take a long time to train, which means waiting around for results. Second is they do not tend to scale well with more layers (hard to improve model accuracy) or with more GPUs (hard to make train faster). With skip connections and residual layers we have learned about many ways to get fully-connected and convolutional networks to train with more layers to get better results. But RNNs just do not seem to like being deep. You can add more layers, and skip connections, but the do not show the same degree of benefits like improved accuracy.