8 Evaluating detectors and parameters

 

This chapter covers

  • The effect of the k parameter used with k nearest neighbors and local outlier factor
  • Techniques to evaluate detectors
  • Evaluating the similarity in scores between detectors
  • Using synthetic data to test outlier detectors
  • Comparing the train and predict times for detectors under different workloads

Now that we’ve described a number of outlier detection algorithms, the question you’ll face is: Which is the best detector, or the best set of detectors, to use for your projects? We can’t answer this completely, as each detector will be more appropriate in some circumstances than others, but we will go through some methods to help compare outlier detectors.

8.1 The effect of the number of neighbors

8.1.1 2D plots of results

8.1.2 1D plots of results

8.2 Contour plots

8.2.1 Examining parameter choices with contour plots

8.2.2 Examining detectors with contour plots

8.3 Visualizing subspaces in real-world data

8.3.1 2D plots of results on real-world data

8.3.2 Contour plots on real-world data

8.4 Correlation between detectors with full real-world datasets

8.5 Modifying real-world data

8.5.1 Adding known anomalies

8.5.2 Evaluation metrics

8.5.3 Evaluating detectors using accuracy metrics

8.5.4 Adjusting the training size used

8.5.5 Adding extreme values

8.6 Testing with classification datasets

8.7 Timing experiments

Summary