9 Improving retention with long short-term memory networks

This chapter covers

Adding deeper memory to recurrent neural nets
Gating information inside neural nets
Classifying and generating text
Modeling language patterns

For all the benefits recurrent neural nets provide for modeling relationships, and therefore possibly causal relationships, in sequence data they suffer from one main deficiency: a token’s effect is almost completely lost by the time two tokens have passed.^[1] Any effect the first node has on the third node (two time steps after the first time step) will be thoroughly stepped on by new data introduced in the intervening time step. This is important to the basic structure of the net, but it prevents the common case in human language that the tokens may be deeply interrelated even when they’re far apart in a sentence.

¹ Christopher Olah explains why: https://colah.github.io/posts/2015-08-Understanding-LSTMs.

Take this example:

The young woman went to the movies with her friends.

The subject “woman” immediately precedes its main verb “went.”^[2] You learned in the previous chapters that both convolutional and recurrent nets would have no trouble learning from that relationship.

² “Went” is the predicate (main verb) in this sentence. Find additional English grammar terminology at - https://www.butte.edu/departments/cas/tipsheets/grammar/sentence_structure.html.

But in a similar sentence:

The young woman, having found a free ticket on the ground, went to the movies.

9 Improving retention with long short-term memory networks

This chapter covers

9.1 LSTM

9.1.1 Backpropagation through time

9.1.2 Where does the rubber hit the road?

9.1.3 Dirty data

9.1.4 Back to the dirty data

9.1.5 Words are hard. Letters are easier.

9.1.6 My turn to chat

9.1.7 My turn to speak more clearly

9.1.8 Learned how to say, but not yet what