chapter seven

7 How do you measure classification models? Accuracy and its friends

This chapter covers

How accuracy can help us evaluate models.
Types of errors a model can make: False positives and false negatives.
Putting these errors in a table: The confusion matrix.
What is recall, and what models require this metric?
What is precision, and what models require this metric?
Metrics that combine both recall and precision, such as the F-1 score and Fβ-score
Two new ways to evaluate a classification model: Sensitivity and specificity.
What is the threshold of a classification model, and what models have this feature?
How does changing the threshold of a model affect the sensitivity and specificity?
What is the ROC curve, and how does it keep track of sensitivity and specificity while changing the threshold of the model?
What is the area under the curve (AUC), and how does it evaluate our classification models?
Performing a trade-off between sensitivity and specificity in order to pick the best model that solves our problem in hand

7.1 Accuracy - How often is my model correct?

7.1.1 Two examples of models - Coronavirus and spam email

7.1.2 A super effective yet super useless model

7.2 How to fix the accuracy problem? Defining different types of errors and how to measure them

7.2.1 False positives, false negatives, and which one is worse?

7.2.2 Storing the correctly and incorrectly classified points in a table - the confusion matrix

7.2.3 Recall - Among the positive examples, how many did we correctly classify?

7.2.4 Precision - Among the examples we classified as positive, how many did we correctly classify?

7.2.5 Combining recall and precision as a way to optimize both - The F-score

7.2.6 Recall, precision, or F-scores - Which one should I use?

7.3 A very useful tool to evaluate our model - The receiver operating characteristic (ROC) curve

7.3.1 Sensitivity and specificity - two new ways to evaluate our model (actually only one of them is new)

7.3.2 The receiver operating characteristic (ROC) curve: a way to optimize sensitivity and specificity in a model

7.3.3 A metric that tells us how good our model is - The AUC (area under the curve)

7.3.4 How to make decisions using the ROC curve

7.3.5 Recall is sensitivity, but precision and specificity are different

7.4 Summary

7.5 Exercises

7.5.1 Exercise 7.1

7.5.2 Exercise 7.2

7.5.3 Exercise 7.3

7.5.4 Exercise 7.4

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }