chapter seven

7 KYC fraud detection using deep learning

This chapter covers

Understanding KYC (Know Your Customer) in the digital realm
Architecting a KYC fraud detection system with automatable checks
Matching faces across selfies and IDs (Identity Documents) using deep learning
Building an ID information extraction model using PyTorch and HuggingFace

KYC is a process used by businesses (especially financial institutions) to verify the identity of their customers. You have probably gone through this process while signing up for digital banking apps, trading apps, gambling apps, or even dating apps, where you had to capture a photo of your face (selfie) and a photo of your ID (passport). The purpose of KYC is to prevent bad players (customers) from onboarding, preventing downstream fraud, as an effort to combat the multi-trillion-dollar financial crime industry (https://www.dowjones.com/professional/risk/resources/glossary).

KYC has its origins in the U.S. Bank Secrecy Act of 1970, which required financial organizations to build systems to detect suspicious activity. This matured into a robust formulation of KYC guidelines in the early 1990s by the Bank of England. Events such as 9/11 and the financial crisis of 2008 led to further strengthening of KYC processes (https://dojah.io/blog/the-history-of-kyc).

KYC is typically broken down into 3 key components as shown in figure 7.1:

7.1 How do we automate KYC?

7.1.1 Types of KYC data

7.1.2 Data contained within identity documents

7.1.3 Automated KYC checks

7.2 Automating the face-matching KYC checks

7 KYC fraud detection using deep learning

This chapter covers

7.1 How do we automate KYC?

7.1.1 Types of KYC data

7.1.2 Data contained within identity documents

7.1.3 Automated KYC checks

7.2 Automating the face-matching KYC checks

7.2.1 How do face-matching models work?

7.2.2 How is a face recognition model trained?

7.2.3 Matching faces between two selfies

7.2.4 Matching faces between a selfie and an ID

7.3 Information extraction

7.3.1 Loading the dataset

7.3.2 Processing the dataset

7.3.3 Define and train model

7.3.4 Test model

7.3.5 Run model on real data

7.4 Summary