This chapter covers
- An introduction to unsupervised machine learning-based outlier detection
- The curse of dimensionality
- Some of the broad categories of outlier detection algorithms used
- Descriptions and examples of some specific algorithms
- The properties of outlier detectors
If you are working on a challenging data problem, such as examining tables of financial data in which you wish to identify fraud, sensor readings that may indicate a need for maintenance, or astronomical observations that may include rare or unknown phenomena, it may be that the statistical techniques we’ve looked at so far are useful but not sufficient to find everything you’re interested in.
We now have a good introduction to outlier detection and can begin to look at machine learning approaches, which allow detection of a much wider range of outliers than is possible with statistical methods. The main factor distinguishing machine learning methods is that the majority, with some exceptions, are multivariate tests: they consider all features and attempt to find unusual records, as opposed to unusual single values. These make more subtle outliers, like fraud, machine failure, or novel telescope readings, much more feasible to detect.