Why ChatGPT's New Ability to Speak Could Change Everything

More ways to communicate

  • OpenAI launched a revamped version of its chatbot that can converse with users. 
  • ChatGPT can now comprehend spoken words, respond with an artificial voice, and evaluate pictures.
  • The chatbot’s new abilities could make technology more inclusive.
A programmer using an AI chatbot to generate software code.
Using AI to Write Code.

cofotoisme / Getty Images

Talking to your chatbot is now a thing, and it could revolutionize how we interact with artificial intelligence (AI). 

OpenAI has released a new version of its chatbot that can talk to people. ChatGPT now has the ability to "see, hear, and speak." The bot can understand spoken language, reply using a synthetic voice, and analyze images.

"Interacting with AI chatbots using spoken words fosters a sense of natural communication, catering to our innate human preference for verbal exchange," Proto Inc.'s AI lead, Raffi Kryszek, told Lifewire via email. "This mode of interaction is not only often faster than typing but also heightens convenience, especially on devices or in settings where typing isn't feasible."

Chatting With Your Bot

The new update to the chatbot, the biggest one from OpenAI since GPT-4, lets users have voice chats on the ChatGPT mobile app. Users can pick from five different robot voices for the chatbot to use. They can also show pictures to ChatGPT, a feature called GPT-4-Vision, and point out specific areas to look at or discuss.

"Snap a picture of a landmark while traveling and have a live conversation about what's interesting about it," the company wrote on its website. "When you're home, snap pictures of your fridge and pantry to figure out what's for dinner (and ask follow-up questions for a step-by-step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you."

With the ability to process voice, ChatGPT can imitate voices and produce speech after hearing just a brief snippet of someone speaking. ChatGPT's updated voice function can tell bedtime stories, help resolve dinner table discussions, and read users' typed text verbally.

OpenAI has acknowledged the risk of this feature being used for impersonation or fraudulent activities. Despite these concerns, the company said that ChatGPT will only use voices already in the system and have received prior approval from the company.

Newer chatbots like OpenAI's ChatGPT are all much better at carrying out conversations and understanding users' instructions than the old generation of Alexa, Siri, and Google Assistant, Chris Callison-Burch, a professor of computer and information science at the University of Pennsylvania, said in an email. "I expect a rapid leap forward in smart assistants as they incorporate generative AI technology."

I expect a rapid leap forward in smart assistants as they incorporate generative AI technology.

The upgraded version of ChatGPT will roll out to Plus and Enterprise users on mobile platforms in the next two weeks, with follow-on access for developers and other users "soon after."

ChatGPT's voice feature could be useful for children, Callison-Burch suggested. He said his children used Amazon Alexa to search the internet.

"My kids asked Alexa science questions like, 'How many teeth do snails have?' or 'Can turtles breathe through their butts?' and quizzed it about Pokémon," he added. "They used it to teach themselves interesting math facts (one of my kids can count by 1000s to novemtrigintillion because of Alexa)."

Callison-Burch said he has had early access to GPT-4-Vision and found it "incredibly impressive." 

"I have used it to describe photographs, figures in scientific papers, and even fine art paintings," he added. "Its descriptions are exceptionally good, and you can have a conversation with it about the images, asking questions and having it answer them."

A 3D graphic of a brain indicating artificial intelligence.
Artificial Intelligence.

Mr.Cole_Photographer / Getty Images

The Future of AI?

The enhanced multimodal capabilities of ChatGPT follow closely after the release of DALL-E 3, OpenAI's latest and most sophisticated image generation system.

OpenAI states that DALL-E 3 also incorporates natural language processing, enabling users to communicate with the model to refine outcomes and coordinate with ChatGPT to assist in generating image prompts.

In the not-so-distant future, voice-activated AI chatbots will be able to understand diverse accents and languages, making technology more inclusive and universal, Kryszek said. 

"This evolution will be coupled with the capability to sense emotions from our voice's subtle cues, creating more empathetic digital assistants," he added. "These advancements are poised to permeate every facet of our lives—from wearables to vehicles, underpinned by robust voice biometrics ensuring utmost security. And as these systems mature, we'll witness a blend of voice, visuals, and tactile feedback, ushering in a new era of immersive and multi-dimensional digital interactions."

Was this page helpful?