chapter five

5 Classifying by maximizing class separation: discriminant analysis

 

This chapter covers:

  • What discriminant analysis is
  • Linear and quadratic discriminant analysis
  • Building discriminant analysis classifiers to predict wines

Discriminant analysis is an umbrella term for multiple algorithms that solve classification problems (where we wish to predict a categorical variable) in a similar way. While there are various discriminant analysis algorithms which learn slightly differently, they all find a new representation of the original data, that maximizes the separation between the classes.

Recall from chapter 1 that predictor variables are the variables we hope contain the information needed to make predictions on new data. Discriminant function analysis algorithms find a new representation of the predictor variables (which must be continuous) by combining them together into new variables that best discriminate the classes. This combination of predictor variables often has the handy benefit of reducing the number of predictors to a much smaller number. Because of this, despite discriminant analysis algorithms being classification algorithms, they are similar to some of the dimension reduction algorithms we’ll meet in part 4 of the book.

Important

Dimension reduction is the process of learning how the information in a set of variables can be condensed into a smaller number of variables, with as little information loss as possible.

5.1  What is discriminant analysis?

5.1.1  How does discriminant analysis learn?

5.1.2  What if I have more than two classes?

5.1.3  Learning curves instead of straight lines: QDA

5.1.4  How do LDA and QDA make predictions?

5.2  Building our first linear and quadratic discriminant models

5.2.1  Loading and exploring the wine dataset

5.2.2  Plotting the data

5.2.3  Training the models

5.3  Strengths and weaknesses of LDA and QDA

5.4  Summary

5.5  Solutions to exercises