This chapter covers:
- Defining precision, recall, true/false positives/negatives, how they relate to one another, and what they mean in terms of our model’s performance.
- A new quality metric, the F1 score, and it’s strengths compared to other possible quality metrics.
- Updating our
logMetricsfunction to compute and store precision, recall, and F1 score.
- Balancing our
LunaDatasetto address the training issues uncovered at the end of chapter 8.
- Using TensorBoard to graph our quality metrics as each epoch of training occurs, and verifying that our work to balance the data results in an improved F1 score.
Note
MEAP readers, please be aware that some of the code on github for this chapter has been updated, and now performs better than it did when this chapter was first written. In particular, some of the TensorBoard graphs will look different. This will be corrected in future updates.
The close of the last chapter left us in a predicament. While we were able to get the mechanics of our deep learning project in place, none of the results were actually useful; the network simply classified everything as benign! To make matters worse, the results seemed great on the surface, since we were looking at the percent of the training and validation sets that were classified correctly. With our data heavily skewed towards benign samples, blindly calling everything benign is a quick and easy way for our model to score well.