For the first time, AI generates videos in real time! You Yang's team's new work, netizens: This is a new era

author：Quantum Position 2024-06-28 19:33:00

You Yang's team's new work, the first DiT-based real-time video generation method is here!

Let's get a feel for the effect first (the new method on the right):

[I can't insert a video here, sorry...... You can go to the qubit public account to view~】

This is the team's test on Open-Sora using five 4s (192 frames) 480p resolution video.

新方法名为Pyramid Attention Broadcast（PAB），由新加坡国立大学尤洋以及3位学生推出。

For the first time, AI generates videos in real time! You Yang's team's new work, netizens: This is a new era

Specifically, PAB achieves up to 21.6 FPS and 10.6x speedup by reducing redundant attention calculations, without sacrificing the quality of popular DiT-based video generation models, including Open-Sora, Open-Sora-Plan, and Latte.

As a training-free approach, PAB provides real-time capabilities for any future DiT-based video generation model.

After watching the comparison of effects, netizens were amazed:

This will be a new era.

It has also attracted many forwards and comments from professionals, such as MIT Dr. Yilun Du said:

is a cool job that shows how to speed up video generation to real-time speed! It could open up new frontiers for real-world use cases of video strategy and simulation.

So, how does the new method solve the problem of generating videos in real time?

Reduce redundant attention calculations

At the beginning, the team compared the difference in attention output between the current diffusion step and the previous step.

These differences are quantified by mean square error (MSE) and averaged across all layers for each diffusion step.

The team captured two key messages:

Over time, attention differences followed a U-shaped pattern, with smaller differences in the middle 70%.
Attention differences are ranked as: spatial > temporal > crossing

Specifically, the difference in attention at different time steps showed a U-shaped pattern, with significant changes in the 15% steps of the first and last steps, while the middle 70% of the steps were very stable with little difference.

Second, in the stable middle part, different types of attention show differences: spatial attention varies the most, involving high-frequency elements such as edges and textures; Temporal attention shows mid-frequency changes related to motion and dynamics in the video; Cross-modal attention is the most stable, associating text with video content, similar to a low-frequency signal that reflects the semantics of the text.

In response, the team formally proposed to use PAB to reduce unnecessary attention calculations.

PAB saves computational effort by outputting attention to different subsequent steps based on the difference in each attention.

For example, just as a radio station sends a signal to multiple listeners, if the attention results of one step are still applicable in the next few steps, there is no need to recalculate, but the previous results are used directly.

The team found that this simple strategy could achieve up to 35% speedup with negligible mass loss, even without post-training.

To further enhance the PAB, the team improved sequence parallelism based on Dynamic Sequence Parallelism (DSP).

Sequence parallelism reduces latency by splitting video across multiple GPUs, but the time attention brought by DSPs requires two pairs of all-to-all communication, resulting in high communication overhead.

The PAB reduces these communication overheads by more than 50% because the time attention no longer needs to be calculated, thus optimizing the distributed inference efficiency of real-time video generation.

With parallelism, PAB can achieve up to 21.6 FPS and 10.6x speedup without sacrificing the quality of popular DiT-based video generation models, including Open-Sora, Open-Sora-Plan, and Latte.

Broadened up, the team measured the total latency of PAB generating a single video for different models on eight NVIDIA H100 GPUs.

PAB achieves a speed increase of 1.26x to 1.32x with a single GPU, and this improvement remains stable across schedulers.

When scaling to multiple GPUs, PAB achieves up to 10.6x faster speeds, and this increase is almost linear with the number of GPUs.

The team behind it

A brief introduction to the team members who proposed the PAB, there are 4 in total.

Professor You Yang must be familiar to everyone, Tsinghua University Master of Computer Science, UC Berkeley Ph.D., after graduation, joined the Department of Computer Science of the National University of Singapore as Presidential Young Professor.

In July 2021, "Luchen Technology" was founded in Zhongguancun, Beijing.

One of the authors, Xuanlei Zhao, has a bachelor's degree in computer science and electronic information from China University of Science and Technology, both master's and doctoral degrees in New National University (currently a Ph.D.), and his supervisor is You Yang, whose research interests include but are not limited to algorithms, data structures, computer networks, signal processing, communication systems, etc.

One of the authors, Kai Wang, a Ph.D. student in the New National HPC-AI Laboratory under the supervision of You Yang, studied in the Department of Electrical Engineering and Automation of Beijing Normal University Zhuhai Branch and his master's degree in the Shenzhen Institute of Advanced Technology of the Chinese Academy of Sciences (MMLAB-SIAT), with a research focus on data-centric artificial intelligence and efficient machine learning. He co-directed the project with Prof. Yang You.

The last one, Xiaolong Jin, studied at the Junior Class College of the University of Science and Technology of China as an undergraduate, and is currently a doctoral student at Purdue University.

At present, the relevant research has been published, and if you are interested, you can learn more.

— END —

QubitAI · 头条号签约

For the first time, AI generates videos in real time! You Yang's team's new work, netizens: This is a new era

Reduce redundant attention calculations

Read on