concept identity matrix in category deep learning

appears as: An identity matrix, identity matrices, n identity matrix, identity matrix, identity matrices
Grokking Deep Learning

This is an excerpt from Manning's book Grokking Deep Learning.

One of the most famous and successful ways of generating vectors for sequences (such as a sentence) is a recurrent neural network (RNN). In order to show you how it works, we’ll start by coming up with a new, and seemingly wasteful, way of doing the average word embeddings using something called an identity matrix. An identity matrix is just an arbitrarily large, square matrix (num rows == num columns) of 0s with 1s stretching from the top-left corner to the bottom-right corner as in the examples shown here.

You may think identity matrices are useless. What’s the purpose of a matrix that takes a vector and outputs that same vector? In this case, we’ll use it as a teaching tool to show how to set up a more complicated way of summing the word embeddings so the neural network can take order into account when generating the final sentence embedding. Let’s explore another way of summing embeddings.

This is the standard technique for summing multiple word embeddings together to form a sentence embedding (dividing by the number of words gives the average sentence embedding). The example on the right adds a step between each sum: vector-matrix multiplication by an identity matrix.

The vector for “Red” is multiplied by an identity matrix, and then the output is summed with the vector for “Sox,” which is then vector-matrix multiplied by the identity matrix and added to the vector for “defeat,” and so on throughout the sentence. Note that because the vector-matrix multiplication by the identity matrix returns the same vector that goes into it, the process on the right yields exactly the same sentence embedding as the process at top left.

Yes, this is wasteful computation, but that’s about to change. The main thing to consider here is that if the matrices used were any matrix other than the identity matrix, changing the order of the words would change the resulting embedding. Let’s see this in Python.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest