Sora is ChatGPT maker OpenAI’s new text-to-video generator. Here’s what we know about the new tool

By James On Feb 16, 2024

NEW YORK — The creator of ChatGPT is now diving into AI-generated video.

Introducing Sora: OpenAI’s new text-to-video generator. The tool, which the San Francisco company unveiled Thursday, uses generative artificial intelligence to instantly create short videos based on written commands.

Sora isn’t the first to demonstrate this kind of technology. But industry analysts point to the high quality of the tool’s videos so far, noting that its introduction represents a significant leap forward for both OpenAI and the future of text-to-video generation in general.

Yet, as with all things in today’s burgeoning AI space, such technology also raises fears about potential ethical and societal implications. Here’s what you need to know.

Sora is a text-to-video generator that creates videos up to 60 seconds long from written prompts using generative AI. The model can also generate video from an existing still image.

Generative AI is a branch of AI that can create something new. Examples include chatbots, such as OpenAI’s ChatGPT, and image generators such as DALL-E and Midjourney. Getting an AI system to generate videos is newer and more challenging, but it’s based on some of the same technology.

Sora isn’t yet available for public use (OpenAI says it’s in discussions with policymakers and artists before officially releasing the tool) and there’s a lot we still don’t know. But since Thursday’s announcement, the company has shared a handful of examples of Sora-generated videos to show what it can do.

Sam Altman, CEO of OpenAI, also took to X, the platform formerly known as Twitter, to ask social media users to submit quick ideas. He later shared realistically detailed videos responding to prompts like “two golden retrievers podcasting on the top of a mountain” and “a bike race on the ocean with various animals as athletes cycling with drone camera view.”

While Sora-generated videos can render complex, incredibly detailed scenes, OpenAI notes that there are still some weaknesses, including some spatial and cause-and-effect elements. For example, OpenAI adds on its website: “A person may take a bite out of a cookie, but afterward the cookie may no longer have a bite mark.”

OpenAI’s Sora isn’t the first of its kind. Google, Meta and the startup Runway ML are among companies that have demonstrated similar technology.

Still, industry analysts are highlighting the apparent quality and impressive length of the Sora videos shared so far. Fred Havemeyer, head of US AI and software research at Macquarie, said the launch of Sora represents a major step forward for the industry.

“Not only can you create longer videos, I understand they can be up to 60 seconds, but also the videos that are created look more normal and actually seem to respect physics and the real world more,” Havemeyer said. “You don’t get as many ‘uncanny valley’ videos or clips on the video feeds that look… unnatural.”

While “tremendous progress” has been made in AI-generated video over the past year – including the introduction of Stable Video Diffusion last November – Forrester senior analyst Rowan Curran said such videos need to be more “stitched together” for character and scene consistency.

However, the consistency and length of Sora’s videos represents “new opportunities for creatives to incorporate elements of AI-generated video into more traditional content, and now even to generate full-fledged narrative videos from one or a few prompts,” Curran told The Associated Press via email Friday.

While Sora’s capabilities have baffled observers since Thursday’s launch, concerns also remain about the ethical and societal implications of AI-generated video use.

Havemeyer, for example, points to the significant risks in the potentially fraught 2024 election cycle. Having a “potentially magical” way to create videos that look and sound realistic poses a number of problems within politics and beyond, he added added – pointing to concerns about fraud, propaganda and disinformation.

“The negative externalities of generative AI will be a crucial topic of debate in 2024,” Havemeyer said. “It’s a substantial issue that every company and every person will have to deal with this year.”

Technology companies continue to lead the charge when it comes to managing AI and its risks, while governments around the world are trying to catch up. In December, the European Union agreed on the world’s first comprehensive AI rules, but the law will not come into effect for two years after final approval.

On Thursday, OpenAI said it has taken important security steps before making Sora widely available.

“We are working with red teamers – domain experts in areas such as disinformation, hate content and bias – who will test the model in an adversarial manner,” the company wrote. “We are also developing tools to help detect misleading content, such as a detection classifier that can tell when a video was generated by Sora.”

OpenAI’s Vice President of Global Affairs Anna Makanju reiterated this when she spoke Friday at the Munich Security Conference, where OpenAI and 19 other tech companies pledged to voluntarily work together to combat AI-generated deepfakes in elections. She noted that the company released Sora “in a way that is quite cautious.”

At the same time, OpenAI has revealed limited information about how Sora is built. OpenAI’s technical report did not disclose which image and video sources were used to train Sora – and the company did not immediately respond to a request for further comment on Friday.

The Sora release also comes against the backdrop of lawsuits against OpenAI and its business partner Microsoft by some authors and The New York Times over the use of copyrighted writing to train ChatGPT. OpenAI pays an undisclosed fee to the AP to license its text news archive.

___

O’Brien reported from Providence, Rhode Island.