chapter eight

8 Yes, no, or maybe so? Logistic regression and classification

 

This chapter covers

  • What classification models are and how they are different from regressions
  • How to perform a logistic regression and validate classification models
  • The machine learning workflow and how it differs from traditional statistics

Classifying is choosing a label for something, like whether an email is spam or not spam, or whether an image contains a dog, cat, or bird. When we are trying to classify a prediction or outcome, we are not predicting a continuous value (e.g., 103.5 degrees Fahrenheit) but rather a binary (spam/not spam) or multiple-choice (dog/cat/bird) outcome. We also call this problem categorical or a form of classification. A model like linear regression is not necessarily equipped for this type of problem, because we need a qualitative value (e.g., is this email spam?) from a limited set of values (spam or not spam).

While predicting continuous values with linear regression (e.g., a temperature) has a lot of useful applications, there are just as many useful applications in trying to put a label on something. This adds another set of solutions you can put in your problem-solving toolbelt. You can see a problem and recognize “Oh! This is a classification problem!” or “Wait! This is predicting a continuous value.” It’s important to have this skill so you can choose the right tool for the job.

Logistic regression

Logistic regression intuition

Logistic regression in Python

Understanding the log-odds

Verifying classification models

Train/test splits

False positives and false negatives

Confusion matrices: Why accuracy is misleading

Precision and recall

Validation in practice

Summary

References