4 Evaluation metrics for classification

This chapter covers

Accuracy as a way of evaluating binary classification models and its limitations
Determining where our model makes mistakes using a confusion table
Deriving other metrics like precision and recall from the confusion table
Using ROC (receiver operating characteristics) and AUC (area under the ROC curve) to further understand the performance of a binary classification model
Cross-validating a model to make sure it behaves optimally
Tuning the parameters of a model to achieve the best predictive performance

In this chapter we will continue with the project we started in the previous chapter: churn prediction. We have already downloaded the dataset, performed the initial preprocessing, exploratory data analysis, and trained the model that predicts whether customers will churn. We have also evaluated this model on the validation dataset and concluded that it has 80% accuracy.

The question we postponed until now was whether 80% accuracy is good or not and what it actually means in terms of the quality of our model. We will answer this question in this chapter and discuss other ways of evaluating a binary classification model: confusion table, precision and recall, the ROC curve and AUC.

4.1 Evaluation metrics

4.1.1 Classification accuracy

4.1.2 Dummy baseline

4.2 Confusion table

4.2.1 Introduction to confusion table

4.2.2 Calculating the confusion table with NumPy

4.2.3 Precision and recall

4.3 ROC curve and AUC score

4.3.1 True positive rate and false positive rate

4.3.2 Evaluating a model at multiple thresholds

4.3.3 Random baseline model

4.3.4 The ideal model

4.3.5 ROC Curve

4.3.6 Area under the ROC curve (AUC)

4.4 Parameter tuning

4.4.1 K-fold cross-validation

4.4.2 Finding best parameters

4.5 Next steps

4.5.1 Other projects

4.6 Summary

4.7 Answers to exercises