4 Linear algebraic tools in machine learning

This chapter covers

Quadratic forms
Applying principal component analysis (PCA) in data science
Retrieving documents with a machine learning application

Finding patterns in large volumes of high-dimensional data is the name of the game in machine learning and data science. Data often appears in the form of large matrices (a toy example of this is shown in section 2.3 and also in equation 2.1). The rows of the data matrix represent feature vectors for individual input instances. Hence, the number of rows matches the count of observed input instances, and the number of columns matches the size of the feature vector—that is, the number of dimensions in the feature space. Geometrically speaking, each feature vector (that is, row of the data matrix) represents a point in feature space. These points are not distributed uniformly over the space. Rather, the set of points belonging to a specific class occupies a small subregion of that space. This leads to certain structures in the data matrices. Linear algebra provides us the tools needed to study matrix structures.

In this chapter, we study linear algebraic tools to analyze matrix structures. The chapter presents some intricate mathematics, and we encourage you to persevere through it, including the theorem proofs. An intuitive understanding the proofs will give you significantly better insights into the rest of the chapter.

4.1 Distribution of feature data points and true dimensionality

4.2 Quadratic forms and their minimization

4.2.1 Minimizing quadratic forms

4.2.2 Symmetric positive (semi)definite matrices

4.3 Spectral and Frobenius norms of a matrix

4.3.1 Spectral norms

4.3.2 Frobenius norms

4.4 Principal component analysis

4.4.1 Direction of maximum spread

4.4.2 PCA and dimensionality reduction

4.4.3 PyTorch code: PCA and dimensionality reduction

4.4.4 Limitations of PCA

4.4.5 PCA and data compression

4.5 Singular value decomposition

4.5.1 Informal proof of the SVD theorem

4.5.2 Proof of the SVD theorem

4.5.3 Applying SVD: PCA computation

4.5.4 Applying SVD: Solving arbitrary linear systems

4.5.5 Rank of a matrix

4.5.6 PyTorch code for solving linear systems with SVD