5 Outlier detection using scikit-learn

This chapter covers

  • An introduction to the scikit-learn library
  • A description and examples of the Isolation Forest, local outlier factor, one-class Support Vector Machine, and Elliptic Envelope detectors
  • A description of three other tools provided by scikit-learn: BallTree, KDTree, and Gaussian mixture models
  • How to most effectively use these
  • Where it is most appropriate to use each

We now have a good general understanding of outlier detection, some specific algorithms, and how outlier detection projects proceed. We will now look at the standard libraries for outlier detection, which will provide the majority, if not all, of the tools you will need for most outlier detection projects, at least for tabular data. Understanding these libraries well will be a major step toward being able to effectively execute outlier detection projects.

5.1 Introducing scikit-learn

5.2 Isolation Forest

5.2.1 The KDD Cup dataset

5.2.2 Using Isolation Forest on the KDD Cup dataset

5.3 LocalOutlierFactor (LOF)

5.4 One-class SVM (OCSVM)

5.4.1 OneClassSVM class

5.4.2 SGDOneClassSVM

5.5 Elliptic Envelope

5.5.1 Mahalanobis distance

5.5.2 Example using the EllipticEnvelope class

5.6 Gaussian mixture models

5.7 BallTree and KDTree

Summary