chapter four

4 Recurrent Neural Networks

This chapter covers:

Weight sharing and its importance to processing sequence data
Representing sequence problems in deep learning
Combining RNNs and fully-connected layers to form a prediction problem.
Packing in PyTorch to get batching code working without on sequence problems.

The previous chapter showed us how to develop neural networks for a specific type of spatial structure, spatial locality. Specifically we learned how the convolution operator endowed our neural network with a prior that items near each other are related, but far away items have no relationship. This allowed us to build neural networks that learned faster and more accurate solutions for classifying images.

Now we want to develop models that can handle a new type of structure. In this case, sequences where we have T items that occur in a specific order. For example, the alphabet “a, b, c, d, …” is a sequence of 26 characters. Each sentence of this book could be thought of as a sequence of words or a sequence of characters. If you wanted to try and predict the weather, you could use the temperature every hour as a sequence to try and predict the temperature in the future. So long as each item in the sequence can be represented as a vector x, we will be able to use a sequence based model to learn over it. For example, videos can be treated as a sequence of images you could use a CNN to convert each image into a vector.[1]

4.1 Recurrent Neural Networks as Weight Sharing

4.2 RNNs in PyTorch

4.2.1 A Simple Sequence Classification Problem

4.2.2 Embedding Layers

4.2.3 Last Time Step

4.3 Improving Training Time With Packing

4.3.1 Packable Embedding Layer

4.3.2 Training a Batched RNN

4.3.3 Simultaneous Packed & Unpacked Inputs

4.4 More Complex RNNs

4.4.1 Multiple Layers

4.4.2 Bidirectional RNNs

4.5 Exercises

4.6 Summary