Friday, March 1, 2024

Meta Voicebox Unveiled as New Text-to-Speech Generative AI Model

Meta, the parent company of Facebook, made an exciting announcement with the unveiling of their latest AI tool called Voicebox. Voicebox is an advanced artificial intelligence model specifically designed to generate speech from text inputs. This new tool promises to deliver high-quality audio clips while retaining the original content and style of the audio. With its remarkable capabilities, Voicebox is set to revolutionize the field of speech generation.

One of the key features of Voicebox is its multilingual support. It has been trained to deliver speech in six different languages, making it highly versatile and accessible to a global audience. Whether it’s English, Spanish, French, German, Mandarin, or another supported language, Voicebox is equipped to handle diverse linguistic requirements.

The machine learning model powering Voicebox also offers a range of additional functionalities. It can effectively remove background noise from audio recordings, ensuring clearer and more focused speech output. This noise removal capability is particularly useful when dealing with recordings that may have interruptions such as car horns or barking dogs, allowing for smoother and uninterrupted audio playback.

Furthermore, Voicebox has the ability to seamlessly edit pre-recorded audio. This means that it can modify specific segments of an audio sample while keeping the overall content and style intact. It can even replace misspoken words within a speech without the need to re-record the entire dialogue. This feature significantly saves time and effort for content creators and speakers who want to make precise adjustments or corrections to their audio recordings.

- Advertisement -

The introduction of Voicebox is yet another notable addition to the growing field of generative text-to-speech models. It follows in the footsteps of Meta’s previous AI innovations like ChatGPT and Dall-E, which have demonstrated the tremendous potential of AI in language processing and image generation, respectively. With Voicebox, Meta continues to push the boundaries of what AI can achieve in transforming the way we interact with and generate speech.

As the adoption of AI-powered tools continues to expand, Voicebox holds great promise for a wide range of applications. It can be employed in industries such as entertainment, voiceovers, audiobook production, virtual assistants, language learning, and more. By enabling efficient and high-quality speech generation, Meta’s Voicebox opens up exciting possibilities for content creators, businesses, and individuals seeking to enhance their audio experiences.

Voicebox can synthesise speech across six languages — English, French, Spanish, German, Polish, and Portuguese. It can create a reading of the text in any of those languages, even when the sample speech and the text are in different languages.

- Advertisement -

Voicebox claimed to outperform Microsoft’s VALL-E and generate audio samples 20 times faster. “Our results show that speech recognition models trained on Voicebox-generated synthetic speech perform almost as well as models trained on real speech, with 1 per cent error rate degradation as opposed to 45 to 70 per cent degradation with a synthetic speech from previous text-to-speech models”, Meta AI detailed in a research paper. Further, a few audio samples are listed to show users the working of Voicebox.

In the blog, Meta further claims that Voicebox can generate speech that is more representative of how people talk in the real world in the aforementioned six languages. The company believes that this capability could be used to generate synthetic data to help better train a speech assistant model in the near future.

- Advertisement -

Latest articles

Related articles