4 Recurrent neural networks

 

This chapter covers

  • Weight sharing and processing sequence data
  • Representing sequence problems in deep learning
  • Combining RNNs and fully connected layers for predictions
  • Padding and packing to use sequences of different lengths

The previous chapter showed us how to develop neural networks for a particular type of spatial structure: spatial locality. Specifically, we learned how the convolution operator endowed our neural network with a prior that items near each other are related but items far from each other have no relationship. This allowed us to build neural networks that learned faster and provide more accurate solutions for classifying images.

Now we want to develop models that can handle a new type of structure: sequen ces with T items that occur in a specific order. For example, the alphabet—a, b, c,d, . . .—is a sequence of 26 characters. Each sentence in this book can be thought of as a sequence of words or a sequence of characters. You could use the temperature every hour as a sequence to try to predict the temperature in the future. As long as each item in the sequence can be represented as a vector x, we can use a sequence-based model to learn over it. For example, videos can be treated as a sequence of images; you could use a convolutional neural network (CNN) to convert each image into a vector.1

4.1 Recurrent neural networks as weight sharing

4.1.1  Weight sharing for a fully connected network

4.1.2  Weight sharing over time

4.2 RNNs in PyTorch

4.2.1  A simple sequence classification problem

4.2.2  Embedding layers

4.2.3  Making predictions using the last time step

4.3 Improving training time with packing

4.3.1  Pad and pack

4.3.2  Packable embedding layer

4.3.3  Training a batched RNN

4.3.4  Simultaneous packed and unpacked inputs

4.4 More complex RNNs

4.4.1  Multiple layers

4.4.2  Bidirectional RNNs

Exercises

Summary