This chapter covers:
- What the logistic regression algorithm is and how it works
- What is feature engineering?
- What is missing value imputation?
- Building a logistic regression classifier to predict survival of the Titanic
In this chapter, I’m going add a new classification algorithm to your toolbox: logistic regression. Just like the k-nearest neighbors algorithm you learned about in the previous chapter, logistic regression is a supervised learning method that predicts class membership. Logistic regression relies on the equation of a straight line and produces models which are very easy to interpret and communicate.
Logistic regression can handle continuous (without discrete categories) and categorical (with discrete categories) predictor variables. In its most simple form, logistic regression is used to predict a binary outcome (cases can belong to one of two classes), but variants of the algorithm can handle multiple classes as well. Its name comes from the algorithm’s use of the logistic function, an equation that calculates the probability that a case belongs to one of the classes.
While logistic regression is most certainly a classification algorithm, it uses linear regression and the equation for a straight line to combine the information from multiple predictors. In this chapter you’ll learn how the logistic function works and how the equation for a straight line is used to build a model.