3 Dimensionality reduction

 

This chapter covers

  • The curse of dimensionality and its disadvantages
  • Various methods of reducing dimensions
  • Principal component analysis
  • Singular value decomposition
  • Python solutions for both principal component analysis and singular value decomposition
  • A case study on dimension reduction
Knowledge is a process of piling up facts; wisdom lies in their simplification.
—Martin H. Fischer

We face complex situations in life. Life throws multiple options at us, and we choose a few viable ones from them. This decision of shortlisting is based on the significance, feasibility, utility, and perceived profit from each of the options. The ones that fit the bill are then chosen. A perfect example can be selecting your vacation destination. Based on the weather, travel time, safety, food, budget, and several other options, we choose a few where we would like to spend our next vacation. In this chapter, we study precisely the same—how to reduce the number of options—albeit in the data science and machine learning world.

3.1 Technical toolkit

3.2 The curse of dimensionality

3.3 Dimension reduction methods

3.3.1 Mathematical foundation

3.4 Manual methods of dimensionality reduction

3.4.1 Manual feature selection

3.4.2 Correlation coefficient

3.4.3 Algorithm-based methods for reducing dimensions

3.5 Principal component analysis

3.5.1 Eigenvalue decomposition

3.5.2 Python solution using PCA

3.6 Singular value decomposition

3.6.1 Python solution using SVD

3.7 Pros and cons of dimensionality reduction

3.8 Case study for dimension reduction

3.9 Concluding thoughts

3.10 Practical next steps and suggested readings

Summary