chapter eight

8 Evaluating detectors and parameters

This chapter covers

The effect of the k parameter used with k nearest neighbors and local outlier factor
Techniques to evaluate detectors
Evaluating the similarity in scores between detectors
Using synthetic data to test outlier detectors
Comparing the train and predict times for detectors under different workloads

Now that we’ve described a number of outlier detection algorithms, the question you’ll face is: Which is the best detector, or the best set of detectors, to use for your projects? We can’t answer this completely, as each detector will be more appropriate in some circumstances than others, but we will go through some methods to help compare outlier detectors.

8.1 The effect of the number of neighbors

8.1.1 2D plots of results

8.1.2 1D plots of results

8.2 Contour plots

8.2.1 Examining parameter choices with contour plots

8.2.2 Examining detectors with contour plots

8.3 Visualizing subspaces in real-world data

8.3.1 2D plots of results on real-world data

8.3.2 Contour plots on real-world data

8.4 Correlation between detectors with full real-world datasets

8.5 Modifying real-world data

8.5.1 Adding known anomalies

8.5.2 Evaluation metrics

8.5.3 Evaluating detectors using accuracy metrics

8.5.4 Adjusting the training size used

8.5.5 Adding extreme values

8.6 Testing with classification datasets

8.7 Timing experiments

Summary