8 Detect voice faking using transformers
This chapter covers
- A brief review of voice faking
- Understanding the Fake-or-Real dataset
- Extracting audio features from voice samples
- Training a transformer model to detect fake voices
- Testing model performance on a different fake voice dataset
“Please wait as we connect you to one of our representatives”. It’s a line most of us have heard over the phone at some point while calling customer service. This voice is computer-automated, which means that no actual person is speaking live with you. A human pre-records such lines or the lines are entirely computer-generated (think Apple’s Siri or Amazon’s Alexa). With the availability of increasingly sophisticated AI, computer-generated voices are becoming the norm with use cases such as personal voice assistants (Siri, Alexa), customer service (Bland AI, Retell AI), celebrity voices (ElevenLabs, Descript), and so on.