in

Video generation models as world simulators


This technical report focuses on (1) our method for turning visual data of all types into a unified representation that enables large-scale training of generative models, and (2) qualitative evaluation of Sora’s capabilities and limitations. Model and implementation details are not included in this report.

Much prior work has studied generative modeling of video data using a variety of methods, including recurrent networks,[^1][^2][^3] generative adversarial networks,[^4][^5][^6][^7] autoregressive transformers,[^8][^9] and diffusion models.[^10][^11][^12] These works often focus on a narrow category of visual data, on shorter videos, or on videos of a fixed size. Sora is a generalist model of visual data—it can generate videos and images spanning diverse durations, aspect ratios and resolutions, up to a full minute of high definition video.

This German nonprofit is building an open voice assistant that anyone can use

This German nonprofit is building an open voice assistant that anyone can use

EU 'final' talks to fix AI rules to run into second day -- but deal on foundational models is on the table

No ‘GPT’ trademark for OpenAI