Humans can only correctly recognize 73% of deepfake speech samples, according to a study.
According to a study, humans are unable to distinguish over 25% of deepfake speech samples.
This is the first study to look at how well humans can detect artificially produced speech in languages other than English, and it was published in PLOS ONE.
Deepfakes are a type of synthetic media that mimic the voices or looks of real people. They belong to the class of artificial intelligence that is generative.
In order to duplicate authentic sounds or sights, this type of artificial intelligence (AI) uses machine learning to instruct an algorithm the patterns and features of a data collection, such as a video or audio recording of a real person.
Present-day pre-trained algorithms are capable of replicating an individual’s speech with just a three-second audio clip, whereas previously they needed thousands of voice samples.
These open-source algorithms are not only easily available, even for a novice user, but they are also simple to train—possibly in a matter of days.
The IT giant Apple recently unveiled a software function for its iPhone and iPad devices that allows users to clone their voices with 15 minutes of audio, which is a noteworthy advance.
Using a text-to-speech algorithm, the researchers at University College London created deepfake speech samples in Mandarin and English for their study.
This system produced 50 deepfake speech recordings in each language after being trained on two public data sets.
The samples were purposefully distinct from the ones that were used to train the algorithm so that it wouldn’t just replicate the initial input.
A total of 529 individuals were exposed to both artificially created and actual samples in order to test people’s capacity to distinguish between the two.
Only 73% of the participants were able to correctly recognize the fake speech; this percentage only slightly increased when they received training on deepfake speech recognition.
Lead author of the study Kimberly Mai from UCL Computer Science said, “Our findings indicate that humans have no ability to accurately identify deepfake speech, whether or not they have undergone training to help them identify artificial content.”
“It begs the question of whether humans would fare poorly when identifying deepfake speech generated with more advanced technology now and in the future, given that the samples we used were created with relatively old algorithms.”
In order to combat the risks presented by artificially created audio and visuals, the researchers are currently working on developing more advanced automatic speech detectors.
Although generative AI audio technology has advantages, like improving accessibility for those with speech disorders or those who might become voiceless due to sickness, worries about its possible exploitation by criminals and governments to damage individuals and societies are growing.
Speech deepfakes are frequently found using AI-powered detectors. They encounter a lot of examples of both actual and phony speech throughout training, Ms. Mai said to The National.
By means of this procedure, the detectors acquire patterns that enable them to differentiate synthesized speech from authentic instances.
According to our research, we shouldn’t rely too much on the AI-powered sensors that are now available on the market.
“Although they are proficient in recognizing instances of deepfake speech that resemble training samples, such as when the speaker identity remains the same, their performance may deteriorate when the test audio undergoes alterations, such as when the speaker identity changes or the surrounding noise level increases.”
An example of this may be found in a 2019 event when the CEO of a British energy company was duped into sending hundreds of thousands of pounds to a phony supplier by utilizing a deepfake audio recording of his boss’s voice.
Senior study author Lewis Griffin of UCL Computer Science stated, “We’re on the edge of seeing numerous advantages as well as risks, with creative artificial intelligence technology becoming advanced and many of these tools openly available.”
“We must acknowledge the promising developments that lie ahead, even as organizations and governments must design plans for dealing with the incorrect application of these instruments.”
Regarding the creation and application of deepfake speech detectors, Ms. Mai said, “Because the deepfake speech detectors can be adaptable to shifts in audio, it is vital to assess them in various situations, for example, various speakers, noisier environments, or varying accents, in order to minimize false positives and negatives.”