chapter five

5 Outlier detection using scikit-learn

 

This chapter covers

  • An introduction to the scikit-learn library
  • A description and examples of the IsolationForest, Local Outlier Factor, One Class SVM, and Elliptic Envelope detectors
  • A description of three other tools provided by scikit-learn: BallTree, KDTree, and Gaussian Mixture Models
  • Guidelines to most effectively use these
  • Discussion of where it is most appropriate to use each

We now have a good general understanding of outlier detection, some specific algorithms, and how outlier detection projects proceed. We will now look at the standard libraries for outlier detection, which will provide the majority, if not all, of the tools you will need for most outlier detection projects, at least for tabular data. Understanding these libraires well will be a major step towards being able to effectively execute outlier detection projects.

5.1 Introduction to scikit-learn

5.2 IsolationForest

5.2.1 The KDD Cup dataset

5.2.2 Using IsolationForest on the KDD Cup dataset

5.3 LocalOutlierFactor (LOF)

5.4 One-class SVM (OCSVM)

5.4.1 OneClassSVM class

5.4.2 SGDOneClassSVM

5.5 EllipticEnvelope

5.5.1 Mahalanobis Distance

5.5.2 Example using the EllipticEnvelope class

5.6 Gaussian mixture models (GMM)

5.7 BallTree and KDTree

5.8 Summary