Google is taking another stab at text-to-video generation with Lumiere, a new AI model that can create surprisingly high-quality content.
The tech giant has certainly come a long way since the days of Imagen Video. Topics inside Lumiere videos are no longer these nightmarish creatures with melting faces. Now it looks much more realistic. Sea turtles look like sea turtles, animal fur has the right texture, and the people in AI clips have genuine smiles (for the most part). Plus, there’s very little of the strange jerky motion seen with other generative text-to-video AIs. The movement is smooth as butter for the most part. Inbar Mosseri, research team leader at Google Research, published a video on her YouTube channel demonstrating Lumiere’s capabilities.
Google has put a lot of work into making Lumiere’s content as lifelike as possible. The development team achieved this by implementing something called Space-Time U-Net Architecture (STUNet). The technology behind STUNet is quite complex. But if Ars Technica explainsThis allows Lumiere to understand where objects are in a video, how they move and change and display these actions simultaneously, resulting in a smooth creation.
This conflicts with other generative platforms that capture keyframes in clips first and then fill in the gaps. This results in the jerky movement the technology is known for.
Well rested
In addition to text-to-video generation, Lumiere has plenty of features in its toolkit, including multimodality support.
Users can upload source images or videos to the AI so it can edit them according to their specifications. For example, you can upload an image of woman with a pearl earring by Johannes Vermeer and make it into a short clip where she laughs instead of staring blankly. Lumiere also has an ability called Cinemagraph, which can animate highlighted parts of images.
Google demonstrates this by selecting a butterfly sitting on a flower. Thanks to the AI, the output video shows the butterfly flapping its wings while the flowers around it remain still.
Especially when it comes to video, things get impressive. Video Inpainting, another feature, works similarly to Cinemagraph in that the AI can edit parts of clips. The green patterned dress for a woman can be changed to shiny gold or black. Lumiere goes one step further by offering Video Stylization for changing video subjects. An ordinary car driving on the road can be converted into a vehicle made entirely of wood or Lego bricks.
Still in the making
It is unknown if there are plans to launch Lumiere to the public or if Google plans to implement it as a new service.
We might see the AI appear on a future Pixel phone as the evolution of Magic Editor. If you’re not familiar with it, Magic Editor uses “AI processing (to) intelligently change” spaces or objects in photos on the Pixel 8. Video Inpainting seems to us to be a natural development for the technology.
For now, it looks like the team will keep it behind closed doors. As impressive as this AI may be, there are still problems. There are choppy animations present. In other cases, subjects have limbs that warp into mush. If you would like to learn more, you can find Google’s research report on Lumiere at Cornell University’s arXiv website. Be warned: it is a compact book.
And check out Ny Breaking’s roundup of the best AI art generators for 2024.