One month after its launch, Vidu has once again ushered in a major update!
At the end of April, Vidu, the first pure self-developed original video model in China jointly developed by Shengshu Technology and Tsinghua University, was released to the world, and was officially launched at the end of July and fully open for use.
In just over a month, Vidu has seen a major feature update. On September 11th, Biodigital Technology held a media open day, during which Vidu launched the "Subject Consistency" function for the first time in the world, which can achieve consistent generation of any subject, making video generation more stable and controllable. At present, this feature is free for users, and you can experience it by registering! (Experience address: www.vidu.studio)
1. The world's first! Only one image is needed to control the subject
The so-called "subject reference" allows users to upload a picture of any subject, and Vidu can lock the image of the subject, switch scenes arbitrarily through descriptors, and output a video with the same subject.
This function is not limited to a single object, but is oriented to "any subject", whether it is a character, animal, commodity, anime character, or fictional subject, it can ensure its consistency and controllability in video generation, which is a major innovation in the field of video generation. Vidu is also the world's first video model to support this capability.
For example, when it comes to "subject reference" of characters, whether they are real or fictional, Vidu is able to maintain their coherent image in different environments and different lenses.
Loading...
For example, for animals, Vidu can achieve consistent details in different environments and large movement states.
Loading...
For example, the appearance and details of the product are highly consistent in different scenes.
Loading...
The subject of the upload is not limited to the realistic style, such as for anime characters or fictional subjects, etc., Vidu can also maintain its high degree of consistency.
Loading...
Loading...
Loading...
In the field of large video models, although there are capabilities such as "Picture Video" and "Role Consistency", Vidu's "Subject Reference" function has achieved a qualitative leap in consistency. For specific comparisons:
- Image to Video: Based on the continuous generation of the first frame, the target scene cannot be directly output, which limits the diversity of video content and the degree of freedom of the scene.
- Character to Video: It is limited to the consistency of the facial features of the character, and it is difficult to ensure the stability of the overall image of the character.
- Subject Consistency: not limited to the character, facing any subject, secondly, under the main body of the character, you can choose to keep the face consistent, you can also choose to keep the overall image of the character highly consistent, and flexibly output the target scene through the input text description;
Let's take a specific case as a specific example, enter a character photo of Lin Daiyu and enter the same description of "drinking coffee in a modern café", we can intuitively see that under the Vidu "subject reference" function, Lin Daiyu's image has been perfectly preserved in the modern scene, and the scene output is also natural and real.
Original
Tusheng video
Loading...
Role consistency
Loading...
Principal References
Loading...
2. Changing the "rules of the game" for video creation
The competition in the field of large video models is becoming increasingly fierce, and although many models are emerging one after another, they all have a core problem - lack of controllability, or lack of consistency.
In actual video creation, video content often revolves around specific objects, which can be characters or specific objects, and the image of these objects in the video needs to be continuous. However, it is often difficult for existing video models to achieve this, and the subject is often prone to collapse during the generation process.
Especially when complex actions and interactions are involved, maintaining the consistency of the subject is even more challenging. In addition, the output results of the video model have a large randomness, and the control of details such as lens use and light and shadow effects is not fine enough. Therefore, although the current video models have achieved a certain degree of breakthroughs in picture expression, physical laws, imagination, etc., the lack of controllability limits their application in creating coherent and complete video content. At present, most AI video content is still based on the splicing of independent video materials, and the plot is not coherent enough.
In order to solve this problem, the industry has tried to adopt the method of "AI image generation first, then image generation video", through AI drawing tools such as Midjourney to generate storyboard images, first keep the subject consistent at the image level, and then convert these images into video clips and edit and synthesize.
The problem is that the consistency of AI drawings is not perfect, and it often needs to be solved through repeated revisions and partial redraws. What's more, the actual video production process involves many scenes and shots, and this method can account for more than half of the whole process when dealing with multi-component scenes, and the final video content will also lack creativity and flexibility due to excessive reliance on storyboards.
Vidu's Subject Reference feature changes that game completely. It abandons the traditional storyboard generation step, and directly generates video footage by "uploading the subject image + entering the scene descriptor". This innovative approach not only dramatically reduces workload, but also breaks down the limitations of storyboards on video content, allowing creators to create rich, flexible video content based on textual descriptions and greater imagination. This breakthrough will bring unprecedented freedom and innovation to video creation.
(The picture shows the reshaping of the AI video production process)
3. Accelerate the creation of story and advertising videos
This feature has indeed been "highly praised" by many first-line creators.
- All you need is three makeup photos to complete the short film
By locking down the image of a character or object, the "subject reference" feature allows the storyline to be more coherent on the one hand, and the creator to explore the depth and breadth of the story more freely on the other.
Li Ning, the initiator and young director of Guangchi Matrix, is creating China's first AIGC theatrical film "Xuanyu". He used Vidu to pre-create a video clip of the male protagonist, in which all the characters were generated only through the three makeup photos of the male protagonist: close-up, medium, and long-range. Li Ning mentioned in the creation sharing that the previous AI film creation process used the traditional Wensheng diagram and Tusheng video process, which was difficult to control in the coherence of the storyboard, and the overall shape of the characters was difficult to consistency. Vidu's "subject reference" feature significantly improves the overall consistency of the characters, eliminating the need to generate a large number of images in the early stage, and the movement of the characters and the transitions of the images are more natural, which greatly facilitates the creation of long-form narratives.
Shi Yuxiang (Senhai Fluorescence), director of China Central Radio and Television Station and AIGC artist, created an animated short film "Summer Gift", and he said in sharing the creative process that compared with the basic picture video function, the "subject reference" function gets rid of the shackles of static pictures, and the generated pictures are more appealing and free, which greatly improves the coherence of creation. At the same time, it helps him save about 7% of the workload of creating pictures, significantly improves efficiency, and allows him to focus more on polishing the story content instead of generating picture materials. In addition, he said that when combined with Vidu's complex motion processing ability and multi-element comprehension ability, he feels that Vidu is like a real assistant "animator" assisting in the creation.
- Based on a product image, it only takes six hours to complete a commercial film
The "subject reference" feature shows great potential in the commercial film direction. One of the key to advertising is to ensure the consistency of the brand image in multiple shots and different scenes. For example, in the following example of a running shoe advertisement, all the video images were generated with just one product image, and the image of the running shoes remained highly consistent throughout the video, regardless of angles, backgrounds, or dynamic performance.
Loading...
According to the official introduction of Shengshu Technology, the video only took one person 6 hours to complete, including pre-planning, material generation, and post-editing, of which 30 AI video materials only took 3 hours to generate, and the whole process only referred to a product map. Traditional advertising film production is highly dependent on offline shooting and post-production, which has a long time cycle and high cost investment, but now through Vidu, the cost of advertising production can be greatly saved, the entire output process is more efficient, and the brand can be more flexible in the development of new materials.
At the same time as the release of the function, Biodigital Technology has also launched a partner program, inviting institutions in the advertising, film and television, animation, game and other industries to join in to jointly explore new video creation models and cooperate in content co-creation, technical support, market expansion and other aspects. The first batch of partners include well-known enterprises and institutions such as Happy Twist, Maoyan Entertainment, Giant Network, Markor Home, Sunac Culture, Henan Intangible Cultural Heritage Protection and Wisdom Center, and Li Keqi Painting Institute.
Fourth, "subject reference" is the beginning of the complete narrative of AI
As the first purely self-developed video model in China, Vidu has received widespread attention overseas since its release. After its official launch at the end of July, Vidu's product performance ranked among the "first echelon" of global video models with its highlights in dynamics, semantic understanding, animation style, and fast reasoning, and set off a craze for a variety of AI-themed gameplay on overseas social media platforms such as TikTok, such as "hugging across time and space". According to third-party data, Vidu ranked first in the growth rate of user visits to web products in the world in the first month of its launch.
For the field of professional creation, Vidu has also joined hands with a number of AI artists at home and abroad to explore a new creative model empowered by AI. For example, the animated short film "All the Way South", which was co-created with Ainimate Lab, the winner of the AIGC Short Film Unit at the Beijing Film Festival, has a picture quality close to the standard of traditional animation production, but the cost is only 1/40 of the traditional process. Chen Liufang, head of AI at Ainimate Lab, said that the creative team of the short film consisted of only three people: a director, a storyboard artist, and an AIGC technology application expert, and the production cycle was about one week, while the traditional process required 20 people, including different "jobs" such as director, art, modeling, lighting, and rendering, and the cycle was about one month. As a result, Vidu has significantly shortened the production cycle and significantly reduced costs.
Tang Jiayu said that the launch of the new feature of "subject reference" represents the beginning of a complete AI narrative, and AI video creation will also move towards a more efficient and flexible stage. Whether it is the production of short videos, animation works or advertising films, in the art of narrative, a complete narrative system is the organic combination of elements such as "consistent subject, consistent scene, and consistent style".
Therefore, for a video model to achieve narrative integrity, it must be fully controllable on these core elements. The Body Reference feature is an important step forward for Vidu in terms of consistency, but it's just the beginning. In the future, Vidu will continue to explore how to accurately control complex elements such as multi-agent interaction, style unity, and stable switching of changing scenes to meet higher-level narrative needs.
From a longer-term perspective, once total controllability is achieved, the video creation industry will undergo a disruptive transformation. At that time, characters, scenes, styles, and even elements such as the use of lenses, light and shadow effects, etc., will be transformed into parameters that can be flexibly adjusted. Users only need to move their fingers and adjust parameters to complete the creation of a video work, and behind each work will also be a unique worldview and self-expression built by users based on AI.