5 A gentle introduction to classification

 

This chapter covers

  • Writing formal notation
  • Using logistic regression
  • Working with a confusion matrix
  • Understanding multiclass classification

Imagine an advertisement agency collecting information about user interactions to decide what type of ad to show. That’s not uncommon. Google, Twitter, Facebook, and other big tech giants that rely on ads have creepy-good personal profiles of their users to help deliver personalized ads. A user who’s recently searched for gaming keyboards or graphics cards is probably more likely to click ads about the latest and greatest video games.

Delivering an advertisement specially crafted to each person may be difficult, so grouping users into categories is a common technique. A user may be categorized as a gamer to receive relevant video game-related ads, for example.

Machine learning is the go-to tool for accomplishing such a task. At the most fundamental level, machine-learning practitioners want to build a tool to help them understand data. Labeling data items as belonging in separate categories is an excellent way to characterize data for specific needs.

5.1 Formal notation

5.2 Measuring performance

5.2.1 Accuracy

5.2.2 Precision and recall

5.2.3 Receiver operating characteristic curve

5.3 Using linear regression for classification

5.4 Using logistic regression

5.4.1 Solving 1D logistic regression

5.4.2 Solving 2D regression

5.5 Multiclass classifier

5.5.1 One-versus-all

5.5.2 One-versus-one

5.5.3 Softmax regression

5.6 Application of classification