Do AI video-generators dream of San Pedro? Madonna among early adopters of AI’s next wave

Whenever Madonna sings the ’80s hit “La Isla Bonita” during her concert tour, moving images of swirling, sunset-hued clouds play on the giant arena screens behind her.

To achieve that ethereal look, the pop legend embraced a little-known branch of generative artificial intelligence: the text-to-video tool. Type a few words – for example ‘surreal cloud sunset’ or ‘waterfall in the jungle at sunrise’ – and an instant video is created.

Following in the footsteps of AI chatbots and still image generators, some AI video enthusiasts say the emerging technology could one day turn entertainment on its head, allowing you to choose your own movie with customizable storylines and endings. But there is still a long way to go before they can do that, and many ethical pitfalls to come.

For early adopters like Madonna, who has long pushed the boundaries of art, it was more of an experiment. She deleted an earlier version of ‘La Isla Bonita’ concert footage that used more conventional computer graphics to evoke a tropical atmosphere.

“We tried CGI. It looked kind of boring and cheesy and she didn’t like it,” said Sasha Kasiuha, content director of Madonna’s Celebration Tour, which continues until the end of April. “And then we decided to try AI.”

ChatGPT maker OpenAI gave a glimpse of what advanced text-to-video technology could look like when the company recently showed off Sora, a new tool that isn’t yet publicly available. Madonna’s team tried a different product than New York-based startup Runway, which helped pioneer the technology by releasing its first public text-to-video model last March. The company released a more advanced “Gen-2” version in June.

Cristóbal Valenzuela, CEO of Runway, said that while some see these tools as a “magical device where you type a word and somehow conjure up exactly what you had in mind,” the most effective approach comes from creative professionals who are looking to upgrade to the decades-old digital editing software they already use.

He said Runway can’t make a full-length documentary yet. But it can help to fill in some background video or b-roll: the supporting shots and scenes that help tell the story.

“That might save you a week of work,” Valenzuela said. “The common thread across many use cases is that people use it as a way to extend or accelerate something they could have done before.”

Runway’s target audiences are “major streaming companies, production companies, post-production companies, visual effects companies, marketing teams, advertising companies. A lot of people who create content for a living,” Valenzuela said.

Dangers await. Without effective safeguards, AI video generators could threaten democracies with convincing “deepfake” videos of things that never happened, or – as is already the case with AI image generators – flood the internet with fake pornographic scenes depicting apparently real people. recognizable faces. Under pressure from regulators, major tech companies have pledged to watermark AI-generated output to help identify what is real.

There are also ongoing copyright disputes over the video and image collections on which the AI ​​systems are trained (neither Runway nor OpenAI discloses its data sources) and the extent to which they unfairly replicate trademarked works. And there are fears that video-making machines could eventually replace human jobs and artistry.

For now, the longest AI-generated video clips are still measured in seconds and may contain jerky movements and telltale glitches such as deformed hands and fingers. Solving that problem is “just a matter of more data and more training,” and the computing power on which that training depends, says Alexander Waibel, a professor of computer science at Carnegie Mellon University who has been researching AI since the 1970s.

“Now I can say, ‘Make me a video of a rabbit dressed as Napoleon walking through New York City,’” Waibel said. “It knows what New York City looks like, what a rabbit looks like, what Napoleon looks like.”

That’s impressive, he said, but still far from a compelling storyline.

Before releasing its first-generation model last year, Runway claimed AI fame as co-developer of the Stable Diffusion image generator. Another company, London-based Stability AI, has since taken over Stable Diffusion’s development.

The underlying ‘diffusion model’ technology behind most leading AI image and video generators works by mapping noise, or random data, onto images, effectively destroying an original image and then predicting what a new image would look like must see. It borrows an idea from physics that can be used to describe, for example, how gas diffuses outward.

“What diffusion models do is they reverse that process,” says Phillip Isola, an associate professor of computer science at the Massachusetts Institute of Technology. “They take some kind of randomness and coagulate it back into the volume. That’s the way to go from randomness to substance. And so you can make random videos.”

Generating video is more complicated than still images because it must take into account temporal dynamics, or how elements in the video change over time and across sequences of frames, says Daniela Rus, another MIT professor who leads teaches the Computer Science and Artificial Intelligence Laboratory.

Rus said the computing resources required are “significantly higher than for still image generation” because “it involves processing and generating multiple frames for every second of video.”

That hasn’t stopped some well-heeled tech companies from continuing to outdo each other by showing off higher-quality AI video generation at longer durations. Requiring written descriptions to create an image was just the beginning. Google recently demonstrated a new project called Genie, which can be called upon to transform a photo or even a sketch into “an endless variety” of explorable video game worlds.

In the near term, AI-generated videos are likely to pop up in marketing and educational content, offering a cheaper alternative to producing original footage or acquiring stock footage, says Aditi Singh, a researcher at Cleveland State University who text-to-text technology has explored. video market.

When Madonna first talked to her team about AI, the “main intention wasn’t, ‘Oh, look, it’s an AI video,'” says Kasiuha, the creative director.

“She asked me, ‘Can you just use one of those AI tools to sharpen the photo, make sure it looks current and has high resolution?’” Kasiuha said. “She likes it when you introduce new technology and new types of visual elements.”

Longer AI-generated films are already being made. Runway hosts an annual AI film festival to showcase such works. But whether that’s what the human audience will choose to watch remains to be seen.

“I still believe in people,” says Waibel, the CMU professor. “I still believe that it will ultimately become a symbiosis where you have some AI proposing something and a human improving or guiding it. Or the humans will do it and the AI ​​will solve it.”


Associated Press journalist Joseph B. Frederick contributed to this report.