Meta 今天发布了 MovieGen 系列媒体基础AI模型,该模型可根据文本提示生成带声音的逼真视频。 MovieGen 系列包括两个主要模型: MovieGen Video 和 MovieGen Audio。
MovieGen Video is a transformer model with 30 billion parameters that generates high-quality, high-definition images and videos from a single text prompt, resulting in video up to 16 seconds at 16 frames per second.
MovieGen Audio is a 13 billion parameter transformer model that can take video input and optional text prompts, and generate up to 45 seconds of high-fidelity audio synchronized with the input video. This new audio model can generate ambient sounds, instrumental background music, and Foley sounds. Meta claims that it offers state-of-the-art results in terms of audio quality, video-to-audio alignment, and text-to-audio alignment.
These models aren't just used to create brand new videos. They can be used to edit existing videos with simple text prompts. MovieGen also allows users to make localized edits, such as adding, removing, or replacing elements, as well as making global changes such as background or style changes. For example, if you have a video of someone throwing a ball, complete with a simple text prompt, you can change the video to someone throwing a watermelon while keeping the rest of the original content.
The MovieGen model will allow users to create personalized videos. By using character images and text prompts, these models can generate personalized videos that retain the character's features and movements. Meta claims that these models offer state-of-the-art results in terms of character protection and natural movement in videos.
Meta claims that these models produce better videos than other video generation models, including OpenAI Sora and Runway Gen-3. Meta is currently working with creative professionals to further refine the model before it is released publicly.
Learn more/Meta