chapter six

6 The PyOD library

 

This chapter covers

  • Introducing the PyOD library
  • Describing several of the detectors provided by the library
  • Providing guidance related to where the different detectors are most useful
  • Describing PyOD’s support for thresholding scores and accelerating training

The PyOD (Python Outlier Detection) library (https://github.com/yzhao062/pyod) provides the largest collection of outlier detectors available in python for numeric tabular data, covering both traditional machine learning and deep learning-based methods. For working with outlier detection in python, PyOD is probably the most effective tool you can look at. As well as a large collection of detectors, PyOD provides a number of other tools and, similar to scikit-learn, provides a simple, consistent API for the detectors, making it efficient to work with.

In all, depending on how similar detectors are counted, there are about 29 detectors based on traditional unsupervised machine learning and eight based on deep learning (most forms of GAN or AutoEncoder-based detection). We cover the former here and deep-learning methods in a later chapter. It's not feasible to cover all the traditional detectors in PyOD, so we’ll just provide a sample of several. This will provide a good introduction to PyOD and will help further present some of the diversity of methods to identify outliers in tabular data that exists today.

6.1 The PyOD common API

6.2 HBOS: Histogram-based Outlier Score

6.3 ECOD: Empirical Cumulative Distribution Functions

6.4 COPOD: Copula-based Outlier Detection

6.5 ABOD: Angle-Based Outlier Detection

6.6 CBLOF: Clustering-Based Local Outlier Factor

6.7 LOCI – Local Correlation Integral

6.8 COF: Connectivity-based Outlier Factor

6.9 PCA: Principal Component Analysis

6.9.1 Univariate tests on the components

6.9.2 PyODKernelPCA

6.9.3 The PCA detector

6.9.4 KPCA

6.10 SOD: Subspace Outlier Detection

6.11 Feature Bagging

6.12 CD: Cook’s Distance

6.13 Using SUOD for faster model training

6.14 The PYOD thresholds module

6.15 Summary