chapter four

4 Machine learning for fraud detection

 

This chapter covers

  • The machine learning advantage over rules to detect fraud
  • Types of classical machine learning models
  • Typical machine learning lifecycle
  • Fixing class imbalance in fraud datasets

We are living in a flourishing era of machine learning. Have you ever noticed how rarely we see spam emails in our inbox (as opposed to the spam folder)? Aren’t those weather forecasts mostly in the ballpark? Haven’t those commute predictions on navigation apps been mostly accurate? All of these applications are powered by machine learning.

Nearly a decade ago, a phrase became popular – software is eating the world (https://a16z.com/2011/08/20/why-software-is-eating-the-world/) - implying the penetration of software technology across all industries. A similar phenomenon is happening now where machine learning has become the new software or software 2.0, a term coined by Andrej Karpathy (https://karpathy.medium.com/software-2-0-a64152b37c35).

4.1 Machine learning versus rules

4.2 Types of ML models

4.3 Using supervised ML for fraud detection

4.3.1 Regression versus classification

4.3.2 Using logistic regression as a fraud detector

4.3.3 Using K nearest neighbors for catching fraud

4.3.4 Using decision tree to detect fraud

4.3.5 Using random forest for fighting fraud

4.3.6 Using gradient boosted trees to fight fraud

4.4 Using unsupervised ML in fraud detection

4.4.1 Clustering fraud versus not fraud

4.4.2 Reducing fraud dataset dimensions

4.5 The machine learning lifecycle

4.5.1 Collecting fraud data

4.5.2 Cleaning a fraud dataset

4.5.3 Extracting features from a fraud dataset

4.5.4 Selecting, training, and evaluating fraud detection ML models

4.5.5 Deploying and monitoring a fraud detection model as a service

4.6 Handling class imbalance in fraud datasets

4.6.1 Using random under-sampling to balance fraud data

4.6.2 Using random oversampling for data balancing

4.6.3 Using SMOTE for better oversampling

4.6.4 Using weighted loss function to handle fraud imbalance

4.7 Summary