chapter one

1 Introducing outlier detection

This chapter covers

What outlier detection is
Some examples of places where outlier detection is used
A quick introduction to some approaches to outlier detection
The fundamental problem of outlier detection

Outlier detection refers to the process of finding items that are unusual. For tabular data, this usually means identifying unusual rows in a table; for image data, unusual images; for text data, unusual documents, and similarly for other types of data. The specific definitions of “normal” and “unusual” can vary, but at a fundamental level, outlier detection operates on the assumption that the majority of items within a dataset can be considered normal, while those that differ significantly from the majority may be considered unusual, or outliers. For instance, when working with a database of network logs, we assume that the majority of logs represent normal network behavior, and our goal would be to locate the log records that stand out as distinct from these.

Outlier detection plays a pivotal role in many fields. Its applications include fraud detection, network security, financial auditing, regulatory oversight of financial markets, medical diagnosis, and the development of autonomous vehicles. Although outlier detection often doesn’t garner the same attention as many other machine learning disciplines, such as prediction, generative AI, forecasting, or reinforcement learning, it holds a place of significant importance.

1.1 Why do outlier detection?

1.1.1 Financial fraud

1.1.2 Credit card fraud

1.1.3 Network security

1.1.4 Detecting bots on social media

1.1.5 Industrial processes

1.1.6 Self-driving vehicles

1.1.7 Healthcare

1.1.8 Astronomy

1.1.9 Data quality

1.1.10 Evaluating segmentation

1.2 Outlier detection’s place in machine learning

1.3 Outlier detection in tabular data

1.4 Definitions of outliers

1.5 Trends in outlier detection

1.6 How does this book teach outlier detection?

Summary