New voice cloning AI allows “you” to speak several languages

New voice cloning AI allows “you” to speak several languages

This article is part of Future Explored, a weekly guide to technology changing the world. You can get stories like this straight to your inbox every Thursday morning by subscribe here.

In January, Microsoft unveiled an AI that can clone a speaker’s voice after hearing them speak for just three seconds. Although this system, VALL-E, was far from the first voice-cloning AI, the accuracy and need for such a small sound sample set a new bar for the technology.

Microsoft has now raised the bar again with an update called “VALL-E X”, which can clone a voice from a short sample (4 to 10 seconds) and then use it to synthesize speech in another languageall while preserving the voice, emotion and tone of the original speaker.

We may soon live in a world where anyone can generate audio that sounds like anyone saying anything in any language.

Microsoft hasn’t released VALL-E X to the public yet, but it has published a demo page with translations between English and Chinese, along with a preprint where it reveals plans to expand the AI ​​to include other languages.

If Microsoft decides to make the tool available, or if similar tools are rolled out by the countless other AI companies out there, we could soon be living in a world where anyone can generate audio that sounds like anyone is saying anything in any language – and it can have big consequences.

Good conversation: Dozens of voice-cloning AIs are already available online and, like VALL-E, are trained on large datasets of speech. Given an example of a new voice, they can use their training to predict what it will sound like reading a text message and generate the sound.

See also  Téléchargez l'application TopStore sur iOS (iPhone et iPad)

Some can even do what the VALL-E X does and produce audio in languages ​​other than the one originally spoken.

These services often require longer samples than Microsoft’s AI – a person might need to recite a few dozen sentences or even provide hours of audio – and the quality of the output can vary, but having a voice clone can be hugely useful, especially for content creators.

For example, an author can use their voice clone to generate an audiobook, saving them from spending days in the recording studio or hiring a professional. They could even feed it written translations of their book to generate author-read audiobooks in several other languages.

Speech, again: Aside from helping writers, filmmakers, podcasters, and other creators reach new audiences—and new revenue streams—voice clones can also help people who have lost their own voice due to illness or injury continue to sound like themselves.

University of Edinburgh spin-out SpeakUnique, for example, makes voice clones for people with ALS and other forms of motor neuron disease. If samples from before the disease began to affect the person’s speech are not available, SpeakUnique can even repair minor impairments in the training recordings.

While SpeakUnique requires users to recite 150 and 300 sentences to create a voice clone, advances like the VALL-E may eventually allow them to do so with just one sentence, which could make the technology more accessible to people who don’t easily speak for. .

Once they have their voice clone, they can pair it with text-to-speech apps or eye-tracking software to communicate with their own voice. As mind-reading technology improves, users may eventually be able to use their clones after losing the ability to even move their eyes.

Actor Val Kilmer famously used voice cloning. After a battle with throat cancer left him unable to speak clearly, AI company Sonantic used 30 minutes of audio from previous films to create a voice clone for him.

Kilmer can now use it to dub his acting performances, as he recently did in “Top Gun: Maverick.”

“[Val] and his team knew that building a custom voice model would help him explore new ways to communicate, connect and create in the future, John Flynn, Sonantic’s co-founder and CTO, wrote in a 2021 blog post.

Deepfake audio: While voice cloning gives Kilmer more work opportunities, it can have the opposite effect on other performers.

Motherboard recently reported how studios are pushing actors with less cachet than Kilmer to agree to have their voices cloned. In theory, they could get paid for one session in the recording studio, then watch their clone replace them for future work.

Tim Friedlander, president and founder of the National Association of Voice Actors, told Motherboard that some even use confusing language in contracts so they can get away with cloning actors’ voices without them knowing.

“Many voice actors may have signed a contract without realizing that language like this had been added,” he said.

Other actors are told they can either agree to the clause or be passed over for work, according to Friedlander, but some performers are never gets the opportunity to decide whether it is okay to have their voice cloned or for what purpose.

In January, internet users used startup ElevenLabs’ free voice-cloning app, which only needs a minute of sound to create a clone, to generate clips of Emma Watson, Joe Rogan and other celebrities “saying” hateful things they never actually said.

Pair a voice clone with deeply fake images and you have content that looks and sounds real but is anything but, making it easier for bad actors to not only tarnish a celebrity’s reputation, but also create convincing propaganda and spread misinformation .

ElevenLabs now requires users to pay for the service, but before it added this protection, a Motherboard journalist demonstrated how he was able to create a free voice clone of himself with just five minutes of audio and then use it to bypass the bank’s voice – recognition system.

If systems like the VALL-E and VALL-E X become widely available, something as short as your voicemail message could be enough for criminals to breach your bank accounts, hack your technology, or defraud your loved ones.

Bottom line: Microsoft seems keenly aware that people can abuse their voice-cloning AIs – the demo pages for both VALL-E and VALL-E X end with ethical statements highlighting the potential for spoofing.

The VALL-E preprint also mentions the possibility of creating a system to detect AI’s voice clones to reduce risk. Although it hasn’t come to fruition yet, we’re already seeing other researchers develop new ways to distinguish AI-generated voices from human ones.

For such systems to be useful, we need to find a way to implement them, and it is not yet clear how that will work.

For now, combining voice-based passwords with other, less easily spoofed authentication methods can help us avoid being hacked by voice clones, and we can also encourage our less skeptical loved ones to hang up and call us back if they think we’re in trouble – and remind them (and ourselves) not to believe everything we see and hear online.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *