15 Classifying data with logistic regression

This chapter covers

Understanding classification problems and measuring classifiers
Finding decision boundaries to classify two kinds of data
Approximating classified data sets with logistic functions
Writing a cost function for logistic regression
Carrying out gradient descent to find a logistic function of best fit

One of the most important classes of problems in machine learning is classification, which we’ll focus on in the last two chapters of this book. A classification problem is one where you’ve got one or more pieces of raw data, and we want to say what kind of object each one represents. For instance, you might want an algorithm to look at the data of all email messages entering your inbox and classify each one as an interesting message or as unwanted spam. As an even more impactful example, you could write a classification algorithm to analyze a data set of medical scans and decide whether they contain benign or malevolent tumors.

We can build machine learning algorithms for classification, where the more real data our algorithm sees, the more it learns, and the better it performs at the classification task. For instance, every time an email user flags an email as spam or a radiologist identifies a malignant tumor, this data can be passed back to the algorithm to improve its calibration.

15.1 Testing a classification function on real data

15.1.1 Loading the car data

15.1.2 Testing the classification function

15.1.3 Exercises

15.2 Picturing a decision boundary

15.2.1 Picturing the space of cars

15.2.2 Drawing a better decision boundary

15.2.3 Implementing the classification function

15.2.4 Exercises

15.3 Framing classification as a regression problem

15.3.1 Scaling the raw car data

15.3.2 Measuring BMWness of a car

15.3.3 Introducing the sigmoid function

15.3.4 Composing the sigmoid function with other functions

15.3.5 Exercises

15.4 Exploring possible logistic functions

15.4.1 Parameterizing logistic functions

15.4.2 Measuring the quality of fit for a logistic function

15.4.3 Testing different logistic functions

15.6 Summary