OpenAI unveiled its new text-to-video AI to great fanfare on Thursday, dazzling the masses — or, at least, all of the Very Online AI Guys — with a series of AI-generated video clips. It’s safe to say that the hype machine is fully up and running: although the model has yet to be opened to the public, X-formerly-Twitter was quickly stacked with long and lionizing threads sharing various Sora-spun videos and the prompts used to generate them.
“It’s been 24 hours since the OpenAI changed the AI video world with Sora,” read one such post, shared by AI influencer Rowan Cheung. “Here are the 14 most mindblowing video generations so far…”
“Yesterday, Sora dropped, revolutionizing AI video forever,” added another blue-check AI optimist. “Simulation Theory believers are feeling vindicated.”
To OpenAI’s credit, the first Sora videos it released — like this sharp-looking scene of AI-generated woolly mammoths — are legitimately impressive. And the company surely laid out the red carpet, proclaiming in an introductory blog post that the new model “understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.” High praise!
But as so many AI demos seem to go, the more Sora-generated clips we see, the clearer it is that the text-to-video model’s outputs suffer from many of the issues that other video and image-generating AI programs do.
For example, take this Altman-shared video created from the prompt: “a bicycle race on ocean with different animals as athletes riding the bicycles with drone camera view.” At first glance, it looks pretty solid! The more you look, though, the weirder it gets. Some of the animals are simply floating in the air — hi, hover-dolphin! — while others are freakish, made-up ocean monsters. (Seriously, what is that thing on the left?!)
In another aquatic flop, the AI generates a video of “New York City submerged like Atlantis,” where “fish, whales, sea turtles and sharks swim through the streets of New York.” But the city isn’t exactly submerged; though the streets are full of water, the sea creatures are all floating above the waterline, seemingly swimming through the air.
Further, otherworldly body horror and an overall flawed sense of physics — both of which have become a staple of generative AI-chefed imagery — plague many of Sora’s outputs. (In its announcement blog, OpenAI does warn that Sora “may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect,” though this was notably caveated after OpenAI declared that Sora has an understanding of “how things exist in the physical world.”)
The issues can be glaring. In this video of a cat waking up its owner, for example, the owner’s shoulder seems to morph into the comforter, while a rogue hand suddenly sprouts from the blankets in a place that makes no realistic sense. (Hands generally seem to be an obstacle for Sora, just like they are with other generative AI models.) And elsewhere, a clip generated from the prompt “archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care” starts with some pretty impressive realism — until the plastic chair starts morphing, melting, and floating in mid-air.
This Sora breaks my brain.
What even is reality anymore tbh
Prompt: Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care. pic.twitter.com/CuvvF2ro7I
— Harrison Kinsley (@Sentdex) February 15, 2024
Again, it’s worth re-upping the fact that Sora isn’t yet publicly accessible. Per its blog post, Sora has only been opened to a gaggle of red teamers in addition to a “number of visual artists, designers, and filmmakers,” the latter of whom, according to OpenAI, will offer “feedback on how to advance the model to be most helpful for creative professionals.”
“We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI,” the post continues, “and to give the public a sense of what AI capabilities are on the horizon.”
From where Sora is right now, though, it’s hard to imagine that this thing is spitting out full-fledged films or animations that don’t just feel like fever dreams anytime soon. Plus, according to experts, full-on body horror, ocean freaks, and floating chairs are only Sora’s more obvious issues.
“The reason I’m not scared (yet) of the Sora vids as an animator is that animation is an iterative process, especially when working for a client,” tweeted animator Owen Fern, noting several sloppy — but certainly more nuanced — mistakes in one of Sora’s animated outputs. “Here’s a bunch of notes to improve one of the [animations], which a human could address, but AI would just start over.”
The reason I’m not scared (yet) of the Sora vids as an animator is that animation is an iterative process, especially when working for a client
Here’s a bunch of notes to improve one of the anims, which a human could address, but AI would just start over
What client wants that? pic.twitter.com/VGAjGguZIQ
— Owen Fern (@owenferny) February 16, 2024
Still, other video-building and editing platforms like Runway have garnered plenty of interest through their text-to-video AI services. And studios in Hollywood, many of which are already heavily dependent on computer-generated imagery, have made it very clear that they’re interested. Sora’s bizarre, distorted outputs might be faulty, but that doesn’t mean users — whether individual adopters or massive movie studios looking to cut costs — won’t rush to use AI when they can. How moves to automate will ultimately impact creative workers and the content economy remains to be seen.
“We all laughed in 2022 when Midjourney first came out and said, ‘Oh, that’s cute,'” Reid Southen, a Michigan-based movie concept artist, told The New York Times. “Now people are losing their jobs to Midjourney.”
More on AI and video: Google Not Releasing New Video-Generating AI Because of Small Issue with Gore, Porn and Racism