This chapter covers
- Working with the logistic regression algorithm
- Understanding feature engineering
- Understanding missing value imputation
In this chapter, I’m going to add a new classification algorithm to your toolbox: logistic regression. Just like the k-nearest neighbors algorithm you learned about in the previous chapter, logistic regression is a supervised learning method that predicts class membership. Logistic regression relies on the equation of a straight line and produces models that are very easy to interpret and communicate.
Logistic regression can handle continuous (without discrete categories) and categorical (with discrete categories) predictor variables. In its simplest form, logistic regression is used to predict a binary outcome (cases can belong to one of two classes), but variants of the algorithm can handle multiple classes as well. Its name comes from the algorithm’s use of the logistic function, an equation that calculates the probability that a case belongs to one of the classes.
While logistic regression is most certainly a classification algorithm, it uses linear regression and the equation for a straight line to combine the information from multiple predictors. In this chapter, you’ll learn how the logistic function works and how the equation for a straight line is used to build a model.