What is Sora from OpenAI? The text to video tool explained and when you might use it

ChatGPT maker OpenAI has now revealed Sora, the artificial intelligence engine for turning text prompts into video. Think of Dall-E (also developed by OpenAI), but for movies instead of static images.

It’s still early days for Sora, but the AI ​​model is already making waves on social media, with several clips making the rounds – clips that look like they were put together by a team of actors and filmmakers.

Here we explain everything you need to know about OpenAI Sora: what it’s capable of, how it works, and when you might be able to use it yourself. The era of AI text-driven movies is now here.

OpenAI Sora release date and price

In February 2024, OpenAI Sora was made available to “red teamers” – that is, people whose job it is to test the security and stability of a product. OpenAI has now also invited a select number of visual artists, designers and filmmakers to test out its video generation capabilities and provide feedback.

“We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities lie ahead,” says Open AI.

In other words, the rest of us can’t use it yet. For now, there’s no indication of when Sora might be available to the wider public, or how much we’ll have to pay to access it.

(Image credit: OpenAI)

We can make some rough estimates about the timescale based on what happened with ChatGPT. Before that AI chatbot was released to the public in November 2022, it was preceded by a predecessor called InstructGPT earlier that year. Additionally, OpenAI’s DevDay typically takes place annually in November.

So it’s certainly possible that Sora could follow a similar pattern and hit the public at the same time in 2024. But this is just speculation at the moment and we will update this page as soon as we get a clearer indication about a Sora release. date.

As for price, we also don’t have any indication of how much Sora might cost. By comparison, ChatGPT Plus – which offers access to the latest Large Language Models (LLMs) and Dall-E – currently costs $20 (about £16 / AU$30) per month.

But Sora also requires significantly more computing power than, say, generating a single image with Dall-E, and the process also takes longer. So it’s still not clear exactly how well Sora, which is essentially a research paper, could be turned into an affordable consumer product.

What is OpenAI Sora?

You may be familiar with generative AI models – such as Google Gemini for text and Dall-E for images – that can produce new content from vast amounts of training data. For example, if you ask ChatGPT to write a poem for you, what you get back will be based on loads of poems that the AI ​​has already recorded and analyzed.

OpenAI Sora is a similar idea, but for video clips. You give it a text prompt, like “woman walking down a city street at night” or “car driving through a forest” and you get a video back. As with AI footage models, you can get very specific when it comes to saying what to include in the clip and the style of footage you want to see.

To get a better idea of ​​how this works, watch some of the example videos posted by OpenAI CEO Sam Altman – not long after Sora was revealed to the world, Altman responded to prompts on social media and sent back videos based on text like “a wizard wearing a pointy hat and a blue robe with white stars casting a spell that emits lightning hand shoots’ and with an old tome in his other hand.”

How does OpenAI Sora work?

On a simplified level, the technology behind Sora is the same technology that allows you to search the Internet for images of a dog or cat. Show an AI enough photos of a dog or cat, and it will be able to recognize the same patterns in new images; Likewise, if an AI is trained on a million videos of a sunset or a waterfall, it can generate its own videos.

Of course, there is a lot of complexity underneath, and OpenAI has provided that a deep dive about how the AI ​​model works. It’s trained on internet-scale data to know what realistic videos look like, first analyzing the clips to know what it’s looking at and then learning how to produce its own versions when asked.

So ask Sora to take a video of an aquarium, and it will come back with an approximation based on all the videos about the aquarium it has seen. It uses so-called visual patches, smaller building blocks that help the AI ​​understand what should go where and how different elements of a video should interact and evolve frame by frame.

Sora starts out messier and then gets neater (Image credit: OpenAI)

Sora is based on a diffusion model, where the AI ​​starts with a ‘noisy’ response and then works towards a ‘clean’ output through a series of feedback loops and prediction calculations. You can see this in the frames above, where a video of a dog playing in the show goes from nonsensical blobs to something that actually looks realistic.

And like other generative AI models, Sora uses transformer technology (the last T in ChatGPT stands for Transformer). Transformers use a variety of advanced data analysis techniques to process massive amounts of data. They can understand the most important and least important parts of what is being analyzed, and figure out the surrounding context and relationships between these data chunks.

What we don’t quite know is where OpenAI got its training data from – it doesn’t say which video libraries are used to power Sora, although we do know that it has partnerships with content databases such as Shutterstock. In some cases this is possible see the agreements between the training data and the output that Sora produces.

What can you do with OpenAI Sora?

Currently, Sora is capable of producing HD videos of up to one minute, without any sound, via text prompts. If you want to see some examples of what’s possible, we’ve put together a list of 11 stunning Sora shorts for you to check out – including fluffy Pixar-style animated characters and astronauts in knitted helmets.

“Sora can generate videos of up to one minute long while maintaining visual quality and following the user’s directions,” says OpenAI, but that’s not all. It can also generate videos from still images, fill in missing frames in existing videos, and stitch multiple videos together seamlessly. It can also create static images, or produce endless loops from the included clips.

It can even produce video game simulations like Minecraft, again based on massive amounts of training data that learns what a game like Minecraft should look like. We’ve already seen a demo where Sora can control a player in a Minecraft-like environment, while also accurately rendering surrounding details.

OpenAI currently recognizes some limitations of Sora. Physics is not always logical, with people disappearing, transforming or merging with other objects. Sora doesn’t map a scene with individual actors and props, but makes an incredible amount of calculations about where pixels should go from frame to frame.

In Sora videos, people may move in ways that defy the laws of physics, or details (such as taking a bite out of a cookie) may not be remembered from one frame to the next. OpenAI is aware of these issues and is working to resolve them. You can view some examples on the website OpenAI Sora website to see what we mean.

Despite these bugs, down the line, OpenAI hopes that Sora could evolve into a realistic simulator of physical and digital worlds. In the coming years, Sora technology could be used to generate imaginary virtual worlds for us to explore, or to allow us to fully explore real places that are replicated in AI.

How can you use OpenAI Sora?

Right now, you can’t get into Sora without an invite: OpenAI appears to be handpicking individual creators and testers to help get its video-generated AI model ready for a full public release. How long this preview period will last, whether months or years, remains to be seen – but OpenAI has previously shown a willingness to move as quickly as possible when it comes to its AI projects.

Based on the existing technologies that OpenAI has made public – Dall-E and ChatGPT – it seems likely that Sora will initially be available as a web app. Since launch, ChatGPT has gotten smarter and added new features, including custom bots, and it’s likely that Sora will follow the same path when it fully launches.

Before that happens, OpenAI says it wants to put some guardrails in place: You won’t be able to generate videos that show extreme violence, sexual content, hate speech, or celebrity likenesses. There are also plans to combat misinformation by including metadata in Sora videos indicating they are AI-generated.

You might like it too

Related Post