chapter three

3 Machine learning for classification

This chapter covers

Performing exploratory data analysis for identifying important features
Encoding categorical variables to use them in machine learning models
Using logistic regression for classification

In this chapter, we are going to use machine learning to predict churn.

Churn is when customers stop using the services of a company. Thus, churn prediction is about identifying customers who are likely to cancel their contracts soon. If the company can do that, it can offer discounts on these services in an effort to keep the users.

Naturally, we can use machine learning for that: we can use past data about customers who churned and, based on that, create a model for identifying present customers who are about to leave. This is a binary classification problem. The target variable that we want to predict is categorical and has only two possible outcomes: churn or not churn.

In chapter 1, we learned that many supervised machine learning models exist, and we specifically mentioned ones that can be used for binary classification, including logistic regression, decision trees, and neural networks. In this chapter, we start with the simplest one: logistic regression. Even though it’s indeed the simplest, it’s still powerful and has many advantages over other models: it’s fast and easy to understand, and its results are easy to interpret. It’s a workhorse of machine learning and the most widely used model in the industry.

3.1 Churn prediction project

3.1.1 Telco churn dataset

3.1.2 Initial data preparation

3.1.3 Exploratory data analysis

3.1.4 Feature importance

3 Machine learning for classification

This chapter covers

3.1 Churn prediction project

3.1.1 Telco churn dataset

3.1.2 Initial data preparation

3.1.3 Exploratory data analysis

3.1.4 Feature importance

3.2 Feature engineering

3.2.1 One-hot encoding for categorical variables

3.3 Machine learning for classification

3.3.1 Logistic regression

3.3.2 Training logistic regression

3.3.3 Model interpretation

3.3.4 Using the model

3.4 Next steps

3.4.1 Exercises

3.4.2 Other projects

Summary

Answers to exercises