ChatGPT’s advanced voice mode is sometimes frighteningly good. Are we looking at the future of AI interactions?
ChatGPT got a nifty new advanced voice mode earlier this week. It’s only rolling out to a small number of paying subscribers at the moment (it’s in alpha testing), but we’ve already seen a few teasers of the feature in action.
These are popping up online, on YouTube and X, with the lucky, chosen ChatGPT Plus users having access to the feature and demonstrating it in a series of different tasks. The edge These include requests to sing a song in a certain way, or to imitate accents, to learning the nuances of correct language pronunciation.
If you recall, this functionality was actually revealed at the launch of the GPT-4o a few months ago. The Advanced Voice Mode was delayed due to apparent concerns over tightening security with the feature, but it’s here now, and very much in action as mentioned – with some impressive results to boot.
For example, The Verge points out that ChatGPT gave a lesson in the pronunciation of French words to a user on YouTubewhere the AI is quite helpful.
Here’s another example: a request for sing happy birthday’ in a ‘soulful blues’ style. Or how about ChatGPT tells some jokes in different voices (shy, angry)?
ChatGPT Advanced Voice Mode counts to 10 as fast as it can, then to 50 (this blew me away – it stopped to catch its breath like a human would) pic.twitter.com/oZMCPO5RPhJuly 31, 2024
Finally, check out the posts above and below about ChatGPT’s advanced voice mode X, which includes fast counting and covers regional American accents.
ChatGPT’s advanced voice mode tries out different regional accents in the US. pic.twitter.com/UvDeQUNHLpJuly 31, 2024
If you’re interested in getting in on the action yourself, we’ve heard from OpenAI that all ChatGPT Plus subscribers will be getting Advanced Voice Mode later this year. The full rollout should be complete by “late fall,” so in theory everyone should have it by December.
Analysis: 50 Shades of Cool
If you’ve watched the demos above – pretty cool, huh? If not, check it out…
A lot of attention to detail has been paid to make the advanced speech mode seem more human and realistic. For example, notice the artificial difficulty built into the super-fast count to 50, including a pause for breath. A really nice touch.
Or the blues singing excursion, which isn’t just about the actual singing – which is certainly well done – but also about the in-depth explanation of how the singer might approach the song, and the natural style and delivery of the AI voice here (and elsewhere). These AI interactions are pushed to new heights of realism here, even if there are still wrinkles that need to be ironed out.
On that last point, we weren’t so impressed with the American accents – although this was a big old request, and they were slightly better when the user asked ChatGPT to emphasize them more. And while the AI responses are generally very quick and to the point – and fluid – there’s the odd moment of silence and confusion to be seen when watching a series of these clips online.
Keep in mind that Advanced Voice Mode is still in alpha, and that’s actually quite impressive considering the circumstances – remarkably good in some scenarios. This could be one of those areas where AI moves so fast it’s scary…