If you’ve been following the latest AI news, you know that there are chatbots that let you talk with your voice. OpenAI was one of the first to demonstrate the technology with ChatGPT Advanced Voice mode (currently only 10 minutes per month free), but Google was first to market with Gemini Live (now free for all Android users) and recently closed Microsoft agrees. by revamping the Copilot website and app (which is free for everyone) to include voice calling.
The ability to talk to AI with our voice and have it talk back like a human has been the science fiction dream since Captain James T. Kirk addressed the ship’s computer in Star Trek, but it was later science fiction creations that proved indistinguishable from human creatures, such as the HAL 9000 and the Blade Runner replicants, which fueled our imagination about the possibilities of an AI that could communicate like a human.
Now we seem to be living in the future, because you can already have a conversation with AI via the smartphone or computer you are reading this on. But while we’ve made tremendous progress toward a humanoid companion, there’s still a long way to go, as I recently discovered by putting the latest voice-activated AIs – ChatGPT Advanced Voice mode, Gemini Live and Copilot – to the test for a few weeks. Here are my top three takeaways:
1. Interruptions are a good idea, but they don’t work well
The biggest problem I see with talking AIs is that they can interrupt them successfully, or they can interrupt you when you don’t want them to. It’s great that ChatGPT, Gemini Live, and Copilot all let you interrupt, especially since they tend to give long and heavy answers to everything you ask them, and without that ability you wouldn’t use them. However, that process is often flawed; They either miss your interruption, or they respond to your interruption by talking more. Usually it’s some version of, “Okay, what would you like to know instead?”, when all you want is for them to stop talking so you can start talking. The result is usually a messy series of jumps and starts that undermine the natural flow of the conversation and prevent it from feeling human.
This week I found myself yelling at my phone a lot, “Stop talking!”, just so I could get something in my way, which is not a good look. Especially because I spend most of the day in the office, surrounded by people.
Another problem I often encountered with all chatbots is that I thought I was done talking, when in fact I paused to think about my thoughts and was still halfway through a sentence. The entire AI experience has to be as smooth as butter so that you can have confidence in it, otherwise the spell will be broken.
2. There is not enough local information
Ask any of the current chatbots where the best place to get a pizza locally is and apart from Gemini Live you will be told that they cannot search the internet. Gemini Live is hugely ahead of the curve here – it will make a recommendation for a good place to get pizza. The recommendations aren’t bad, and while it can’t make a reservation for you, it does give you the restaurant’s phone number.
Voice-enabled chatbots obviously need to be able to surf the web, just like text-based chatbots currently can, but right now ChatGPT Advanced Voice mode and Copilot can’t, and that’s a major disadvantage when it comes to delivering relevant information.
3. They’re not personal enough
For voice AI to be useful, it needs to know a lot of information about you. It should also have access to your important apps, like your inbox and your calendar. That is not possible at the moment. If you ask, “Hey, am I free at 4pm on Friday?”, or “When is the next family birthday coming up?”, you are told that that is not possible now, and without things like that. in terms of skill, the usefulness of voice AI just falls off a cliff.
What is a talking AI good for?
Right now, the best use of Voice AI is to ask questions, give you some motivation to do something, or come up with ideas that you wouldn’t have thought of yourself. Pick a topic and let AI talk to you and you’ll find that it knows a surprising amount about a lot of things. It’s fascinating! For example, one of the things I know a lot about is Brazilian Jiu-Jitsu, and I found that I could have a pretty good conversation about it with each of the chatbots, even going into a surprising level of detail regarding techniques and positions. Based on my experience, I’d say Copilot gave me the best answers and Gemini seemed more likely to hallucinate things that weren’t true.
As far as interface goes, I think ChatGPT leads the way. I really like the way the swirling sphere seems to respond with a heartbeat in time with what you’re saying, giving you the confidence that it’s really listening. Gemini Live, on the other hand, has a mostly dark screen with a glowing area at the bottom, leaving you without a focal point to look at, leading to a slightly more soulless experience.
The AI you can now talk to is great for delving into research topics, but it also feels a bit half-finished, and will need a lot more integration with our smartphones before it can perform at the level we would naturally want . it to. Of course it will get better over time. Right now, the elephant in the room is Apple Intelligence and its companion Siri, both of which are late to the party. We’re still waiting for an Apple Intelligence release date, and even then we won’t get the full, all-singing, all-dancing Siri until next year.
Right now, the promise of an AI that we can talk to, just like a friend or a real virtual assistant, seems tantalizingly close, but also far away.