chapter fifteen

15 Working with outlier detection predictions

This chapter covers

Processing the output produced by outlier detection systems
Improving outlier detection systems over time
Taking advantage of labeled data to create more effective ensembles

Once outlier detection systems are put in production, they will begin to identify outliers. If run on a large body of data or if run over a long period of time, they may flag a very large number of outliers, even if this is only a small fraction of the total data examined. It’s usually necessary to examine the output, not only to investigate the outliers found, but also to ensure that the system is working well (particularly for new systems, but even for established systems, which may degrade in effectiveness over time). It’s important that both of these may be done efficiently.

15.1 Hand-labeling output

15.1.1 Hand-labeling specific types of outliers

15.1.2 Hand-scoring the outliers

15.2 Examining the flagged outliers

15.2.1 Manual inspection of the outliers

15.2.2 Executing interpretable detectors

15.2.3 Examining subspaces of features

15.3 Automating the process of sorting outlier detection results

15.3.1 Rules to sort the output of detectors

15.3.2 Classifiers to sort the output of detectors

15.3.3 Executing rules and classifiers on flagged outliers

15.4 Semisupervised learning

15.4.1 Using labeled data to create a stacked model

15.4.2 XGBOD

15.5 Regression testing

Summary