15 Classifying data with logistic regression
This chapter covers
- Understanding classification problems and measuring classifiers
- Finding decision boundaries to classify two kinds of data
- Approximating classified data sets with logistic functions
- Writing a cost function for logistic regression
- Carrying out gradient descent to find a logistic function of best fit
One of the most important classes of problems in machine learning is classification, which we’ll focus on in the last two chapters of this book. A classification problem is one where you’ve got one or more pieces of raw data, and we want to say what kind of object each one represents. For instance, you might want an algorithm to look at the data of all email messages entering your inbox and classify each one as an interesting message or as unwanted spam. As an even more impactful example, you could write a classification algorithm to analyze a data set of medical scans and decide whether they contain benign or malevolent tumors.
We can build machine learning algorithms for classification, where the more real data our algorithm sees, the more it learns, and the better it performs at the classification task. For instance, every time an email user flags an email as spam or a radiologist identifies a malignant tumor, this data can be passed back to the algorithm to improve its calibration.