4 Recurrent neural networks
This chapter covers
- Weight sharing and processing sequence data
- Representing sequence problems in deep learning
- Combining RNNs and fully connected layers for predictions
- Padding and packing to use sequences of different lengths
The previous chapter showed us how to develop neural networks for a particular type of spatial structure: spatial locality. Specifically, we learned how the convolution operator endowed our neural network with a prior that items near each other are related but items far from each other have no relationship. This allowed us to build neural networks that learned faster and provide more accurate solutions for classifying images.
Now we want to develop models that can handle a new type of structure: sequen ces with T items that occur in a specific order. For example, the alphabet—a, b, c,d, . . .—is a sequence of 26 characters. Each sentence in this book can be thought of as a sequence of words or a sequence of characters. You could use the temperature every hour as a sequence to try to predict the temperature in the future. As long as each item in the sequence can be represented as a vector x, we can use a sequence-based model to learn over it. For example, videos can be treated as a sequence of images; you could use a convolutional neural network (CNN) to convert each image into a vector.1