OpenAI makes a fantastic new AI video generator

By Nina McCambridge

Courtesy of Sora/OpenAI via Wikimedia Commons

You believe that you deserve what you get. If you’ve lived all your life in San Francisco like I have, you don’t spend your autumn longing for the colors of Vermont. Neither do you spend your winter gloating about how Vermonters must be feeling cold right now. You compare the weather to what it was yesterday. Used to the unbelievable power of generative AI, you probably didn’t even notice when last month, OpenAI announced a truly amazing video generator, Sora.

OpenAI says that “whereas LLMs have text tokens, Sora has visual patches,” pieces of a compressed version of the video segmented by both space and time. OpenAI says that their “patch-based representation enables Sora to train on videos and images of variable resolutions, durations, and aspect ratios,” whereas “previous video models have been restricted by resolution, duration, or aspect ratio.” OpenAI says that since Sora is a diffusion model, “given input noisy patches (and conditioning information like text prompts), it’s trained to predict the original ‘clean’ patches.” Sora “builds on past research in DALL-E and GPT models,” using a transformer architecture similarly to these other products.

When you give Sora a prompt, it is extended in detail with GPT. (This is also the case with DALL-E 3.) It can also be prompted with images or videos, which it can extend and connect.

However, Sora isn’t perfect. The videos it generates sometimes have odd physics. The videos usually make sense; they are very consistent when it comes to things like object permanence, style, and perspective, but this is not always the case. In one video, which is supposed to depict a glass of juice breaking, the juice somehow spills out the bottom of the glass before it breaks. But when you think back to what AI video generators looked like a little while ago, Sora is truly amazing — though we have known for the past few years that generative AI has and will continue to surpass all of our expectations, so perhaps expectations are useless now.

Like a human, generative AI cannot and is not expected to calculate every event based on precise laws of physics. Rather, it notices patterns that usually apply. Sora essentially simulates the real world in order to predict what is supposed to happen in a video. It can also simulate other worlds, like Minecraft.

Sora is not yet available to the general public (or it is?); it’s currently being red-teamed and otherwise tested. OpenAI is trying to remove “misinformation, hateful content, and bias .” Like other OpenAI products, Sora will reject prompts “that request extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others.” This is typically done through reinforcement learning with human feedback. Other industries are not usually expected to pay so much heed to how people use their tools, but generative AI companies are expected to do this because their industry is so new and so powerful. Also, OpenAI is ostensibly a nonprofit and has always been interested in safety research. They say that they will be “engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology.”

You can look at OpenAI’s webpage where they have posted all of this information about this upcoming product. It is yet to be seen when we will be able to use this product, and how much it will cost.