14 Dimension reduction of matrix data

 

This section covers

  • Simplifying matrices with geometric rotations
  • What is principal component analysis?
  • Advanced matrix operations for reducing matrix size
  • What is singular value decomposition?
  • Dimension reduction using scikit-learn

Dimension reduction is a series of techniques for shrinking data while retaining its information content. These techniques permeate many of our everyday digital activities. Suppose, for instance, that you’ve just returned from a vacation in Belize. There are 10 vacation photos on your phone that you wish to message to a friend. Unfortunately, these photos are quite large, and your current wireless connection is slow. Each photo is 1,200 pixels tall and 1,200 pixels wide. It takes up 5.5 MB of memory and requires 15 seconds to transfer. Transferring all 10 photos will take 2.5 minutes. Fortunately, your messaging app offers you a better option: You can shrink each photo from 1,200 × 1,200 pixels to 600 × 480 pixels. This reduces the dimensions of each photo sixfold. By lowering the resolution, you’ll sacrifice a little detail. However, the vacation photos will maintain most of their information—the lush jungles, blue seas, and shimmering sands will remain clearly visible in the images. Therefore, the trade-off is worth it. Reducing the dimensionality by six will increase the transfer speed by six: it will take just 25 seconds to share the 10 photos with your friend.

14.1 Clustering 2D data in one dimension

14.1.1 Reducing dimensions using rotation

14.2 Dimension reduction using PCA and scikit-learn

14.3 Clustering 4D data in two dimensions

14.3.1 Limitations of PCA

14.4 Computing principal components without rotation

14.4.1 Extracting eigenvectors using power iteration

14.5 Efficient dimension reduction using SVD and scikit-learn

Summary