This chapter covers
- The PyOD library
- Several of the detectors provided by the library
- Guidance related to where the different detectors are most useful
- PyOD’s support for thresholding scores and accelerating training
The PyOD (Python Outlier Detection) library (https://github.com/yzhao062/pyod) provides the largest collection of outlier detectors available in Python for numeric tabular data, covering both traditional machine learning and deep learning-based methods. PyOD is probably the most effective tool for working with outlier detection in Python. As well as a large collection of detectors, PyOD provides several other tools. And, similar to scikit-learn, it provides a simple, consistent API for the detectors, making it efficient to work with.
In all, depending on how similar detectors are counted, there are about 29 detectors based on traditional unsupervised machine learning and eight based on deep learning (most are forms of Generative Adversarial Network (GAN) or AutoEncoder-based detection). We cover the former here and deep-learning methods in chapter 16. It’s not feasible to cover all the traditional detectors in PyOD, so we’ll just provide a sample of several. This will provide a good introduction to PyOD and will help further present some of the diversity of methods to identify outliers in tabular data that exists today.