Chapter 12. Neural networks that write like Shakespeare: recurrent layers for variable-length data

 

In this chapter

  • The challenge of arbitrary length
  • The surprising power of averaged word vectors
  • The limitations of bag-of-words vectors
  • Using identity vectors to sum word embeddings
  • Learning the transition matrices
  • Learning to create useful sentence vectors
  • Forward propagation in Python
  • Forward propagation and backpropagation with arbitrary length
  • Weight update with arbitrary length

“There’s something magical about Recurrent Neural Networks.”

Andrej Karpathy, “The Unreasonable Effectiveness of Recurrent Neural Networks,” http://mng.bz/VPW

The challenge of arbitrary length

Let’s model arbitrarily long sequences of data with neural networks!

This chapter and chapter 11 are intertwined, and I encourage you to ensure that you’ve mastered the concepts and techniques from chapter 11 before you dive into this one. In chapter 11, you learned about natural language processing (NLP). This included how to modify a loss function to learn a specific pattern of information within the weights of a neural network. You also developed an intuition for what a word embedding is and how it can represent shades of similarity with other word embeddings. In this chapter, we’ll expand on this intuition of an embedding conveying the meaning of a single word by creating embeddings that convey the meaning of variable-length phrases and sentences.

Do comparisons really matter?

The surprising power of averaged word vectors

How is information stored in these embeddings?

How does a neural network use embeddings?

The limitations of bag-of-words vectors

Using identity vectors to sum word embeddings

Matrices that change absolutely nothing

Learning the transition matrices

Learning to create useful sentence vectors

Forward propagation in Python

How do you backpropagate into this?

Let’s train it!

Setting things up

Forward propagation with arbitrary length

Backpropagation with arbitrary length

Weight update with arbitrary length

Execution and output analysis

Summary