Root
HiTech and Digital
OpenAI’s new ‘Voice Engine’ clones your voice in only 15 seconds

OpenAI’s new ‘Voice Engine’ clones your voice in only 15 seconds

Do repost and rate:

Rated:9

As artificial intelligence (AI) continues to advance rapidly, ChatGPT maker OpenAI is at the forefront of this progress. The research lab has unveiled a powerful new voice cloning technology called Voice Engine. With just a 15-second audio sample, it can generate a synthetic copy of a person’s voice described as “natural-sounding” and “emotive.” While the company envisions potential benefits, the technology also carries significant risks, particularly as “deepfake” manipulation becomes increasingly sophisticated.

We're sharing our learnings from a small-scale preview of Voice Engine, a model which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. https://t.co/yLsfGaVtrZ
— OpenAI (@OpenAI) March 29, 2024

What is Voice Engine?

So, Voice Engine is an expansion of OpenAI’s existing text-to-speech technology. With this tool, anyone can upload a 15-second audio sample of a voice and generate a synthetic replica. OpenAI is carefully limiting the tool’s availability during its preview phase to assess the technology’s potential for both positive and negative applications. The company emphasizes the importance of understanding the risks and developing safeguards before a wider public release.

Surprisingly, Voice Engine doesn’t rely on storing or fine-tuning user-submitted audio samples. It utilizes a sophisticated AI model that analyzes both the provided audio snippet and the text to be read, generating a matching voice in real-time without creating a permanent record of the individual’s voice.

While voice cloning isn’t new, OpenAI asserts that its approach delivers superior quality. Moreover, the aggressive pricing unveiled in early marketing materials underscores the potential for Voice Engine to disrupt industries reliant on voice work.

Potential Benefits…

OpenAI envisions Voice Engine assisting with reading difficulties, translating languages, and even helping people who have lost their speech communication. They cite a Brown University pilot where a patient experiencing speech impairment used a Voice Engine clone created from an old-school project recording.

…But also serious risks

As AI voice generation becomes more advanced and accessible, it’s not hard to see how bad actors could exploit this technology for malicious deepfakes. Voice Engine arrives in an environment where misinformation aided by realistic audio and video manipulation is already a major concern. OpenAI acknowledges the “serious risks,” which are even more pronounced during an election year.

Also, Voice Engine could commoditize voice work, making it cheaper and easier for businesses to utilize synthetic voices rather than hire human talent. While some AI companies offer marketplaces or compensation models for voice actors whose voices are cloned, OpenAI’s approach primarily relies on user consent and proper disclosure. It remains to be seen how the industry will adapt and if regulations will be put in place to ensure fair compensation and ethical use of voice acting talent.

Delayed rollout, pricing and the bigger picture

Recognizing the need for caution, OpenAI is conducting a limited preview while incorporating feedback from various sectors to decrease the potential for harm. Preview testers must agree to policies prohibiting impersonation without consent and requiring clear disclosure of AI-generated speech. In addition, OpenAI is implementing watermarking to trace audio origins and will monitor how the system is used. A “no-go voice list” aims to prevent the generation of prominent figures’ voices.

While the official release date is unknown, leaked information and a Tech Crunch report suggest Voice Engine could be incredibly affordable – costing $15 for enough text to fill a Stephen King novel. This undercuts many competitors and could make AI-generated audiobooks tempting.

OpenAI’s announcements extend beyond Voice Engine. This week, they also revealed a partnership with Microsoft to build the “Stargate” AI supercomputer, reportedly a $100 billion project.