Chapter 14. Learning to write like Shakespeare: long short-term memory

 

In this chapter

  • Character language modeling
  • Truncated backpropagation
  • Vanishing and exploding gradients
  • A toy example of RNN backpropagation
  • Long short-term memory (LSTM) cells

“Lord, what fools these mortals be!”

William Shakespeare A Midsummer Night’s Dream

Character language modeling

Let’s tackle a more challenging task with the RNN

At the end of chapters 12 and 13, you trained vanilla recurrent neural networks (RNNs) that learned a simple series prediction problem. But you were training over a toy dataset of phrases that were synthetically generated using rules.

In this chapter, you’ll attempt language modeling over a much more challenging dataset: the works of Shakespeare. And instead of learning to predict the next word given the previous words (as in the preceding chapter), the model will train on characters. It needs to learn to predict the next character given the previous characters observed. Here’s what I mean:

import sys,random,math
from collections import Counter
import numpy as np
import sys

np.random.seed(0)

f = open('shakespear.txt','r')
raw = f.read()                  #1
f.close()

vocab = list(set(raw))
word2index = {}
for i,word in enumerate(vocab):
    word2index[word]=i
indices = np.array(list(map(lambda x:word2index[x], raw)))

The need for truncated backpropagation

Truncated backpropagation

A sample of the output

Vanishing and exploding gradients

A toy example of RNN backpropagation

Long short-term memory (LSTM) cells

Some intuition about LSTM gates

The long short-term memory layer

Upgrading the character language model

Training the LSTM character language model

Tuning the LSTM character language model

Summary