chapter five

5 Fitting a logistic regression

This chapter covers

Model fitting
Model interpretation and evaluation
Classification metrics
Data exploration through histograms and correlation heat maps

Logistic regression is a supervised learning method for predicting a binary response from one or more independent variables. It’s commonly used for classification tasks where the dependent variable represents two possible categories or classes (e.g., pass or fail, presence or absence). It estimates the probability that a given instance belongs to a particular category based on the values of the independent variables, or predictors, using the logistic function (also known as the sigmoid function) that maps the output to a range between 0 and 1. We’ll see how the value between 0 and 1 translates to binary outcomes in section 5.1.

The use cases are infinite. Here is just a small sample of problems that can be solved with logistic regression:

Banks predicting whether a loan applicant will default, using factors such as credit score, credit history, income, and (if allowed) demographic data
Wireless carriers predicting the likelihood of customers canceling their service, based on usage patterns and satisfaction scores
Political scientists predicting election outcomes by analyzing survey data
Meteorologists predicting the probability of rain from a mix of satellite and radar data, atmospheric humidity, cloud cover, and other factors

5 Fitting a logistic regression

This chapter covers

5.1 Logistic regression vs. linear regression

5.2 Multiple logistic regression

5.2.1 Importing and exploring the data

5.2.2 Fitting the model

5.2.3 Interpreting and evaluating the results

5.2.4 Calculating and evaluating classification metrics

Summary