Chapter 13. Using principal component analysis to simplify data

 

This chapter covers

  • Dimensionality reduction techniques
  • Principal component analysis
  • Reducing the dimensionality of semiconductor data

Assume for a moment that you’re watching a sports match involving a ball on a flat monitor, not in person. The monitor probably contains a million pixels, and the ball is represented by, say, a thousand pixels. In most sports, we’re concerned with the position of the ball at a given time. For your brain to follow what’s going on, you need to follow the position of the ball on the playing field. You do this naturally, without even thinking about it. Behind the scene, you’re converting the million pixels on the monitor into a three-dimensional image showing the ball’s position on the playing field, in real time. You’ve reduced the data from one million dimensions to three.

In this sports match example, you’re presented with millions of pixels, but it’s the ball’s three-dimensional position that’s important. This is known as dimensionality reduction. You’re reducing data from more than one million values to the three relevant values. It’s much easier to work with data in fewer dimensions. In addition, the relevant features may not be explicitly presented in the data. Often, we have to identify the relevant features before we can begin to apply other machine learning algorithms.

13.1. Dimensionality reduction techniques

13.2. Principal component analysis

13.3. Example: using PCA to reduce the dimensionality of semiconductor - manufacturing data

13.4. Summary