Our voices are about as unique as our fingerprints – so how would you feel if your voice was cloned?
In recent months, a new type of deepfake known as voice cloning has emerged, where hackers use artificial intelligence (AI) to simulate your voice.
Famous faces including Stephen Fry, Sadiq Khan and Joe Biden have all fallen victim to voice cloning, while an unnamed CEO was even tricked into transferring $243,000 to a scammer after receiving a bogus phone call.
But how does it work and how convincing is it?
To find out, I had a professional hacker clone my voice – with terrifying results.
In recent months, a new type of deepfake known as voice cloning has emerged, where hackers use artificial intelligence (AI) to simulate your voice. But how does it work and how convincing is it? To find out, I had a professional hacker clone my voice – with terrifying results
Voice cloning is an AI technique that allows hackers to take an audio recording of someone, train an AI tool on their voice, and recreate it.
Speaking to MailOnline, Dane Sherrets, a Solutions Architect at HackerOne, explained: ‘This was originally used to create audiobooks and to help people who have lost their voice for medical reasons.
“But today it’s being used more and more by Hollywood, and unfortunately by scammers.”
When the technology first emerged in the late 1990s, its use was limited to experts with in-depth knowledge of AI.
Over the years, however, the technology has become more accessible and affordable, to the point where almost anyone can use it, Mr. Sherrets said.
“Someone with very limited experience can clone a voice,” he said.
“It might take less than five minutes with some of the tools available that are free and open source.”
When the technology first emerged in the late 1990s, its use was limited to experts with in-depth knowledge of AI. However, over the years the technology has become more accessible and affordable, to the point where almost anyone can use it, according to Mr Sherrets (stock image)
To clone my voice, all Mr. Sherrets needed was a five-minute clip of me talking.
I opted to record myself reading a Daily Mail story, although Mr Sherrets says most hackers can simply grab the audio of a short phone call or even a video posted to social media.
‘It can happen during a conversation, when something is shared on social media, or even when someone is on a podcast. Basically just things we upload or record every day,” he said.
After I sent Mr. Sherrets the clip, he simply uploaded it to a tool (which he declined to name), which could then be “trained” on my voice.
“Once that was done, I could type into the tool or even speak directly and have the message sound whatever I wanted the message to sound like in your voice,” he said.
“What’s really crazy about the tools that are out there now is that I can add extra inflections, pauses, or other things that make the speech sound more natural, making it a lot more convincing in a con scenario.”
Despite having no pauses or added inflections, the first snippet of my voice clone that Mr. Sherrets created was surprisingly convincing.
The robotic voice matched my American-Scottish hybrid accent perfectly and said, “Hey Mom, it’s Shivali. I have lost my bank card and need to transfer money. Can you please send some to the account that just texted you?”
However, the creepiness was taken a step further in the next clip, where Mr. Sherrets added pauses.
“Towards the end you hear a long pause and then a breath, and that sounds a lot more natural,” the professional hacker explained.
While my experience with voice cloning was fortunately only a demonstration, Mr. Sherrets highlights some of the serious dangers of the technology.
“Some people have had hoax kidnapping calls where their ‘child’ has called them and said, ‘I’ve been kidnapped, I need millions of dollars or they won’t release me,’ and the child sounds very upset,” he said. .
“What we’re seeing more and more today is people trying to make more targeted social engineering attempts against companies and organizations.
‘I used the same technology to clone my CEO’s voice.
“CEOs often appear in public, so it’s very easy to get high-quality audio of their voice and clone it.
The robotic voice matched my American-Scottish hybrid accent perfectly and said, “Hey Mom, it’s Shivali. I have lost my bank card and need to transfer money. Can you please send some to the account that just texted you?”
Voice cloning is an AI technique that allows hackers to take an audio recording of someone, train an AI tool on their voice and recreate it
“Having a CEO’s voice makes it a lot easier to quickly get a password or access to a system. Companies and organizations must be aware of that risk.’
Fortunately, Mr. Sherrets says there are several key signs that indicate a voice is a clone.
“There are important signals,” he told MailOnline.
‘There are the pauses, the problems where it doesn’t sound as natural, and there are maybe what you call ‘artifacts’ in the background.
“For example, if a voice is cloned in a busy room and there are a lot of other people chatting, then when that voice clone is used, you will hear some junk in the background.”
However, as technology continues to develop, these signals will become more difficult to recognize.
“People need to be aware of this technology and be constantly suspicious of anything that requires them to take urgent action – that’s often a red flag,” he explained.
“They need to be quick to ask questions that perhaps only the real person will actually know, and not be afraid to verify things before taking action.”
Mr. Sherrets recommends having a “safe word” with your family and friends.
“If you’re really in an urgent situation, you can say that safe word and they’ll know right away that it’s really you,” he said.
Finally, the expert advises being aware of your digital footprint and keeping an eye on how much you upload online.
“Every time I upload now it increases my audio attack surface and can be used later to train AI,” he added.
“There are trade-offs for that that everyone will have to make, but it’s something you have to be aware of: audio of yourself floating around out there can be used against you.”