Chapter 4. A gentle introduction to classification

 

This chapter covers

  • Writing formal notation
  • Using logistic regression
  • Working with a confusion matrix
  • Understanding multiclass classification

Imagine an advertisement agency collecting information about user interactions to decide what type of ad to show. That’s not uncommon. Google, Twitter, Facebook, and other big tech giants that rely on ads have creepy-good personal profiles of their users to help deliver personalized ads. A user who’s recently searched for gaming keyboards or graphics cards is probably more likely to click ads about the latest and greatest video games.

Delivering a specially crafted advertisement to each individual may be difficult, so grouping users into categories is a common technique. For example, a user may be categorized as a “gamer” to receive relevant video game–related ads.

Machine learning is the go-to tool to accomplish such a task. At the most fundamental level, machine-learning practitioners want to build a tool to help them understand data. Labeling data items as belonging in separate categories is an excellent way to characterize data for specific needs.

4.1. Formal notation

4.2. Measuring performance

4.3. Using linear regression for classification

4.4. Using logistic regression

4.5. Multiclass classifier

4.6. Application of classification

4.7. Summary