7 Additional libraries and algorithms for outlier detection

This chapter covers

  • Additional Python libraries that support outlier detection
  • Additional algorithms not found in libraries
  • Three algorithms that support categorical data
  • An interpretable outlier detection method, association rules
  • Examples and techniques you’ll need to develop your own outlier detection code where necessary

Although scikit-learn and PyOD cover a large number of outlier detection algorithms, there are many algorithms not covered by these libraries that may be equally useful for outlier detection projects. In addition, the detectors provided in these libraries cover only numeric and not categorical data, and may not be as interpretable as may sometimes be necessary. Of the algorithms we’ve looked at so far, only frequent pattern outlier factor (FPOF) and histogram-based outlier score (HBOS) provide good support for categorical data, and most have low interpretability. We’ll introduce in this chapter some other detectors better suited for categorical and mixed data: Entropy, association rules, and a clustering method for categorical data. Association rules detectors also have the benefit of being quite interpretable.

7.1 Example synthetic test sets

7.2 The alibi-detect library

7.3 The PyCaret library

7.4 Local outlier probability (LoOP)

7.5 Local distance-based outlier factor (LDOF)

7.6 Extended Isolation Forest (EIF)

7.7 Outlier Detection Using In-degree Number (ODIN)

7.8 Clustering

7.8.1 Mahalanobis distance per cluster

7.8.2 Kernel density estimation per cluster

7.8.3 Clustering with categorical data

7.9 Entropy

7.10 Association Rules

7.11 Convex Hull

7.12 Distance metric learning (DML)

7.13 NearestSample

Summary