Chapter 8 introduced fraud detection techniques by showing two approaches based on identifying relationships that are explicit in the data. In the first case, each transaction connected the cardholder to the merchant where the card was used. In the second case, bank or credit card accounts were connected by the owner’s personal or access details (phone number, address, IP, and so on). But in most cases, such relationships are not explicit, and in these circumstances, we need to do more work to infer or discover connections or relationships between data items to detect and combat fraud.
This chapter explores advanced algorithms for fighting fraud, borrowed from anomaly detection theory, that are capable of recognizing anomalous items in large transactional datasets in which the data points appear to be independent. As I touched on in chapter 8, anomaly detection is the branch of data mining concerned with discovering rare occurrences, or outliers, in datasets. When you’re analyzing large and complex datasets, determining what stands out in the data is often at least as important and interesting as learning about its general structure.