You’ll soon get to try out OpenAI’s buzzy text-to-video generator for yourself. In an interview with The Wall Street Journal, OpenAI chief technology officer Mira Murati says Sora will be available “this year” and that it “could be a few months.”
OpenAI first showed off Sora, which is capable of generating hyperrealistic scenes based on a text prompt, in February. The company only made the tool available for visual artists, designers, and filmmakers to start, but that didn’t stop some Sora-generated videos from making their way onto platforms like X.
In addition to making the tool available to the public, Murati says OpenAI has plans to “eventually” incorporate audio, which has the potential to make the scenes even more realistic. The company also wants to allow users to edit the content in the videos Sora produces, as AI tools don’t always create accurate images. “We’re trying to figure out how to use this technology as a tool that people can edit and create with,” Murati tells the Journal.
When pressed on what data OpenAI used to train Sora, Murati didn’t get too specific and seemed to dodge the question. “I’m not going to go into the details of the data that was used, but it was publicly available or licensed data,” she says. Murati also says she isn’t sure whether it used videos from YouTube, Facebook, and Instagram. She only confirmed to the Journal that Sora uses content from Shutterstock, with which OpenAI has a partnership.
Murati also told the Journal that Sora is “much more expensive” to power. OpenAI is trying to make the tool “available at similar costs” to DALL-E, the company’s AI text-to-image model, when it’s released to the public. You can see even more examples of what kinds of videos this tool can produce in the Journal’s report, including an animated bull in a China shop and a mermaid smartphone reviewer.
As we approach the 2024 presidential election, concerns about generative AI tools and their potential to create misinformation have only increased. When released, Murati says Sora likely won’t be able to produce images of public figures, similar to DALL-E’s policies. Videos will also have a watermark to distinguish them from the real thing, but as my colleague Emilia David points out, watermarks aren’t a perfect solution.