After Just 3 Seconds This AI Can Mimic Your Speech
January 11, 2023 By Jozeph P
(Image Credit Google)
The announcement that Microsoft is working on an AI that can duplicate anyone's speech after being provided a brief three-second sample has added fuel to artificial intelligence's (AI) current moment.
According to Microsoft, the new tool, known as VALL-E, has been trained on data that is "hundreds of times larger than existing systems" and consists of some 60,000 hours of speech recordings in the English language. Using this understanding, its developers assert that it just requires a sparse amount of vocal input to learn how to mimic a user's speech.
The ability of VALL-E to accurately duplicate each
sample's emotions, vocal tones, and acoustic environment sets it apart from competing for voice AI programs. This gives it a more lifelike appearance and gets the results closer to what would pass for real human speech.
Photo Credit: Laptop Mag
VALL-E "substantially exceeds the state-of-the-art zero-shot TTS system in terms of voice naturalness and speaker likeness," according to Microsoft, when compared to other text-to-speech (TTS) rivals. In other words, when competing AIs hear auditory inputs that they haven't been trained on, VALL-E sounds considerably more like actual humans.
Also Read: ChatGPT Has Become Way Too Popular for Its Good
Microsoft has compiled a small set of VALL-E samples on GitHub. The majority of the results are really excellent, and several samples successfully capture the lilt and accent of the speakers. Overall, the result is convincing, however some of the examples are less compelling, indicating VALL-E is probably not a finished product.
[caption id="" align="aligncenter" width="1500"]
Photo Credit: BetaNews[/caption]
Potential and dangers
Microsoft states in a document announcing VALL-E that there "may be potential risks in the model being misused, such as spoofing voice identification or impersonating a certain speaker." The possibility of ever-more convincing deepfakes, which might be used to impersonate anyone from a former romantic partner to a well-known worldwide figure, is raised by such an effective tool for producing realistic-sounding speech.
Microsoft claims that it is "possible to create a detection algorithm to differentiate whether an audio clip was synthesized using VALL-E" to lessen that hazard. The business claims that when creating its work, it would also utilize its own AI concepts. These values address topics like accountability, fairness, safety, and privacy.
The most recent instance of Microsoft's AI research is VALL-E. The business is currently attempting to integrate ChatGPT into Bing, use
AI to summarize your Teams sessions, and integrate cutting-edge features into programs like Outlook, Word, and PowerPoint. Additionally, according to Semafor, Microsoft, which has previously invested heavily in OpenAI, the firm that makes ChatGPT, is aiming to make a $10 billion investment in it.
Photo Credit: Youtube
Tools like VALL-E, despite the obvious hazards, could be particularly helpful in medicine, for example, to assist people regain their voice after an injury. If done correctly, being able to recreate speech with such a limited input set could be incredibly useful in these circumstances. However, given the amount of money being spent on AI, both by Microsoft and other companies, it is obvious that it will not disappear any time soon.
By Jozeph P
Journalism explorer, tech Enthusiast. Love to read and write.