chapter twenty one

21 Training linear classifiers with logistic regression

This section covers

Separating data classes with simple linear cuts
What is logistic regression?
Training linear classifiers using scikit-learn
Interpreting the relationship between class prediction and trained classifier parameters

Data classification, much like clustering, can be treated as a geometry problem. Similarly, labeled classes cluster together in an abstract space. By measuring the distance between points, we can identify which data points belong to the same cluster or class. However, as we learned in the last section, computing that distance can be costly. Fortunately, it’s possible to find related classes without measuring the distance between all points. This is something we have done before: in section 14, we examined the customers of a clothing store. Each customer was represented by two features: height and weight. Plotting these features revealed a cigar-shaped plot. We flipped the cigar on its side and sliced it vertically into three segments representing three classes of customers: small, medium, and large.

21.1 Linearly separating customers by size

21.2 Training a linear classifier

21.2.1 Improving perceptron performance through standardization

21.3 Improving linear classification with logistic regression

21.3.1 Running logistic regression on more than two features

21.4 Training linear classifiers using scikit-learn

21.4.1 Training multiclass linear models

21.5 Measuring feature importance with coefficients

21.6 Linear classifier limitations

Summary