With OpenAI’s announcement that it is incorporating new speech and image-based smarts into the mix, ChatGPT is developing into much more than a text-based search engine.
Since its release about nine months ago, the enormously popular generative AI assistant has been one of the largest technology success stories of recent times. It enables anyone to write essays, poetry, and summaries from easy text-based prompts. However, ChatGPT is set to become much more involved now that consumers will be able to speak with the chatbot over the phone.
The announcement coexisted with Amazon’s pledge to invest up to $4 billion in OpenAI rival Anthropic. This action is a part of a larger competition between global tech giants in generative AI, in which Google is attempting to catch up with its Bard chatbot, Meta is embracing a strong open-source ethos to give it an advantage, and Microsoft is closely aligning itself with OpenAI.
How to use ChatGPT with verbal conversation?
With OpenAI combining its potent large language models (LLMs) with the well-known realm of voice-based assistants, today marks a significant leap for the generative AI movement.
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb— OpenAI (@OpenAI) September 25, 2023
A user may, for example, ask ChatGPT to create a bedtime story on the spot while providing a few vocal cues to direct the narrative. Alternatively, the user can just ask ChatGPT a question, and it will respond by speaking its answer.
Users of ChatGPT will also be able to search for information using photographs in other places. For example, they can upload a picture of something and ask ChatGPT to describe it or to give them instructions for achieving a goal.
A new text-to-speech technology that can create human-like voices from text and a few seconds of sampled speech powers the voice function. With its open-source Whisper speech recognition engine used to convert spoken words into text, OpenAI said that it collaborated with renowned voice actors to produce five distinct voices.
As a launch partner, Spotify also introduced a fairly cool new function that allows podcasters to sample their voice and convert their programs from English into Spanish, French, or German while maintaining their own distinctive voices. OpenAI, it appears, is taking care to avoid drawing criticism because it isn’t making this technology open to everyone; rather, it has collaborated directly with podcasters for the launch, including Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett.
“The new voice technology — capable of crafting realistic synthetic voices from just a few seconds of real speech — opens doors to many creative and accessibility-focused applications”
the company wrote in a blog post. “However, these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.”
The new features will begin rolling out to paying Plus and Enterprise subscribers in the coming two weeks. To activate voice features, users need to head to the “settings” menu in the app, then head to “new features” and opt-in to voice conversations. They then have to tap the headphone button in the top-right corner, and select the voice they want.
Voice will initially only be available as an opt-in beta feature on the ChatGPT Android and iOS apps, while picture search will be available by default across all platforms.
OpenAI’s announcement that it is incorporating new speech and image-based smarts into ChatGPT is a significant development for the field of generative AI. This will allow ChatGPT to interact with users in a more natural and engaging way, and it will open up new possibilities for its use. For example, ChatGPT could be used to create virtual assistants that can understand and respond to spoken commands, or to create educational tools that can help students learn about different concepts by interacting with images and other multimedia content.
It is still too early to say exactly what the future holds for ChatGPT, but the addition of speech and image-based capabilities is a major step forward. With its ability to generate text, translate languages, write different kinds of creative content, and answer questions in an informative way, ChatGPT has the potential to become a powerful tool for a wide range of applications.
This is massive update by OpenAi as it will be solving a lot of problems and allowing for more creativity to dive in. We will be hoping that this update will bring a lot of ease to users who are utilizing ChatGPT for creative purposes.