chapter fourteen

14 Ensembles of outlier detectors

This chapter covers

The benefits and tradeoffs in creating ensembles
Selecting the detectors for ensembles
Scaling the scores from the detectors
Combining the scores from each detector for a final score

Often when evaluating outlier detectors, we’re able to identify a number of detectors that appear to work well, though none work perfectly. We may also have cases where we appear to detect the known outliers that we test with well but are not confident we will detect the full range of outliers we may encounter in the future. In most cases, the solution in these situations is to use multiple outlier detectors, combining them into an ensemble. This is a powerful technique, and it’s very common to use ensembles for outlier detection problems.

This is similar to the idea of creating ensembles with prediction problems, where ensembling is well understood to be a very powerful technique. In fact, with tabular data, the strongest models tend to be ensembles, for example XGBoost, LGBM, and CatBoost, which are ensembles of decision trees. In addition, the strongest autoML tools, such as AutoGluon, focus on creating ensembles of models as the most effective means to create highly accurate predictive systems.

14.1 Overview of ensembling for outlier detection

14.2 Accuracy metrics with ensembles

14.3 Methods to create ensembles

14.3.1 Different model types

14.3.2 Different preprocessing

14.3.3 Different hyperparameters

14.3.4 Sampling the original features

14.3.5 Different engineered features

14.3.6 Sampling rows

14.4 Selecting detectors for an ensemble

14.4.1 Manually inspecting the superset of many detectors

14.4.2 Taking the consensus of many detectors

14.4.3 Testing random subsets of detectors

14.4.4 Greedy methods to identify a set of detectors

14.4.5 Selecting detectors to minimize correlation

14.5 Scaling scores

14.5.1 Min-max scaling

14.5.2 z-scores (standard scaling)

14.5.3 Robust scaling

14.5.4 MAD scaling

14.5.5 Ranking

14.5.6 Box Cox

Summary