This chapter covers
- Understanding dimension reduction
- Dealing with high dimensionality and collinearity
- Using principal component analysis to reduce dimensionality
Dimension reduction comprises a number of approaches that turn a set of (potentially many) variables into a smaller number of variables that retain as much of the original, multidimensional information as possible. We sometimes want to reduce the number of dimensions we’re working with in a dataset, to help us visualize the relationships in the data or to avoid the strange phenomena that occur in high dimensions. So dimension reduction is a critical skill to add to your machine learning toolbox!