This chapter covers
- What outlier detection is
- Some examples of places where outlier detection is used
- A quick introduction to some approaches to outlier detection
- The fundamental problem of outlier detection
Outlier detection refers to the process of finding items that are unusual. For tabular data, this usually means identifying unusual rows in a table; for image data, unusual images; for text data, unusual documents, and similarly for other types of data. The specific definitions of “normal” and “unusual” can vary, but at a fundamental level, outlier detection operates on the assumption that the majority of items within a dataset can be considered normal, while those that differ significantly from the majority may be considered unusual, or outliers. For instance, when working with a database of network logs, we assume that the majority of logs represent normal network behavior, and our goal would be to locate the log records that stand out as distinct from these.
Outlier detection plays a pivotal role in many fields. Its applications include fraud detection, network security, financial auditing, regulatory oversight of financial markets, medical diagnosis, and the development of autonomous vehicles. Although outlier detection often doesn’t garner the same attention as many other machine learning disciplines, such as prediction, generative AI, forecasting, or reinforcement learning, it holds a place of significant importance.