Chapter 13. Using principal component analysis to simplify data
This chapter covers
Assume for a moment that you’re watching a sports match involving a ball on a flat monitor, not in person. The monitor probably contains a million pixels, and the ball is represented by, say, a thousand pixels. In most sports, we’re concerned with the position of the ball at a given time. For your brain to follow what’s going on, you need to follow the position of the ball on the playing field. You do this naturally, without even thinking about it. Behind the scene, you’re converting the million pixels on the monitor into a three-dimensional image showing the ball’s position on the playing field, in real time. You’ve reduced the data from one million dimensions to three.
In this sports match example, you’re presented with millions of pixels, but it’s the ball’s three-dimensional position that’s important. This is known as dimensionality reduction. You’re reducing data from more than one million values to the three relevant values. It’s much easier to work with data in fewer dimensions. In addition, the relevant features may not be explicitly presented in the data. Often, we have to identify the relevant features before we can begin to apply other machine learning algorithms.