I didn’t expect Google Glass to make a small comeback at Google I/O 2024, but it did thanks to Project Astra.
That’s Google’s name for a new prototype of AI agents, powered by Gemini multimodal AI, that can understand video and voice input, respond intelligently to what someone is actually looking at, and answer questions about it.
Described as a “universal AI” that could be “really useful in everyday life,” Project Astra is designed to be proactive, teachable, and able to understand natural language. And in a video, Google demonstrated this with a person using what looked like a Pixel 8 Pro running the Astra AI.
By pointing the phone’s camera at a room, the person could ask Astra to “tell me when you see something making a sound,” after which the AI will highlight a speaker it can see in the camera’s viewfinder. From there, the person could ask what a particular part of the speaker was, with the AI responding that the part in question is a tweeter and handles high frequencies.
But Astra does much more: it can identify code on a monitor and explain what it’s doing, and it can find out where someone is in a city and provide a description of that area. If promoted, it could even create an alliterative phrase around a set of crayons in a way that sounds a bit Dr. Zeus-like.
It can even remember where the user left a pair of glasses, because the AI remembers where it last saw the glasses. The latter was possible because AI is designed to encode video frames of what it saw, combine that video with voice input, and put it all together into a timeline of events, caching that information so that it can be accessed later at high speed. speed can evoke.
When Astra then turned to a person wearing Google Glass “smart glasses,” he could see the person looking at a diagram of a system on a whiteboard, and figure out where optimizations could be made when asked.
Such capabilities suddenly make Glass seem genuinely useful, rather than the somewhat creepy and arguably blind device it was a handful of years ago; perhaps we’ll see Google return to the smart glasses arena next.
Project Astra is able to do all this thanks to its use of multimodal AI, which in simple terms is a mix of neural network models that can process data and input from multiple sources; think of mixing information from cameras and microphones with knowledge on which the AI has already been trained.
Google didn’t say when Project Astra will appear in products, or even in the hands of developers, but Google’s DeepMind CEO Demis Hassabis said that “some of these capabilities will be coming to Google products later this year, like the Gemini app.” I’ll be very surprised if that doesn’t mean the Google Pixel 9, which we expect to arrive later this year.
It’s worth bearing in mind that Project Astra was shown off in a very slick video, and the reality of such built-in AI agents is that they can suffer from latency. But it’s a promising look at how Google is likely to actually integrate useful AI tools into its future products.