OpenAI recently unveiled the video model, named Sora. OpenAI Sora perfectly preserves the image quality and command-following ability of DALL·E 3, allowing users to input text descriptions and generate high-definition, smooth videos for up to 1 minute.
OpenAI’s Video Model Mimics the Physical World
This model deeply simulates the real physical world, representing a significant advancement in artificial intelligence’s capacity to comprehend and engage with real-world scenes. The released video illustrates a lively scene prompted by the Chinese Year of the Dragon Spring Festival. Amid the bustling crowd, individuals dance to the dragon with smooth and standard movements. Some capture the moment with their mobile phones, showcasing rich details and precision.
In another video featuring an “urban beauty” strolling through Tokyo streets after rain, the water stains on the road and neon light shadow effects are remarkably realistic. Without labels, many might not realize this is an AI-generated video clip. OpenAI mentioned that its technical team teaches AI to understand and simulate the physical world in motion, aiming to train models for problem-solving in real-world interactions. Video generation based on text prompts is just one step in the overall plan.
Sora can currently generate complex scenes with multiple characters and specific movements. It not only comprehends user prompts but also understands how objects exist in the physical world. However, Sora has limitations. OpenAI acknowledged potential challenges in accurately simulating the physics of complex scenes and understanding cause-and-effect relationships. The model might struggle with spatial details like left and right and describing events over time, such as following a specific camera trajectory.
Despite these limitations, the release of OpenAI’s first video model led to expressions of concern from netizens about potential job losses and the decline of the material industry. Some speculate that OpenAI will accelerate AI evolution after the success of large language models. Visual artists, designers, filmmakers, and OpenAI employees have gained access to Sora, showcasing the limitless creative possibilities of AI-generated videos through the continuous publication of new works.
Read Also: Google Gemini 1.0 Officially Unveiled: Comprehensively Ahead of OpenAI GPT-4