Computing power management is complex and training costs are too high, and experts talk about how to solve the AI dilemma

2024-06-30 19:55:13

Computing power management is complex and training costs are too high, and experts talk about how to solve the AI dilemma

China News Network

2024-06-29 21:32Official account of China News Network

　　Chinanews.com, June 29 (Zhongxin Financial Reporter Wu Tao) "The rapid development of large models has made AI have to face many development bottlenecks such as complex computing power management, high training and inference costs, and difficult task scheduling. ”

Computing power management is complex and training costs are too high, and experts talk about how to solve the AI dilemma

The scene of the press conference. Photo courtesy of the courtesy of the

　　Recently, at a press conference held by Ant Digital, Li Wei, deputy director of the Cloud University Institute of the China Academy of Information and Communications Technology, pointed out that cloud native has become the key to breaking through the AI dilemma with its advantages of high availability, elasticity, and scalability, and the continuous improvement of the large model product tool chain of the cloud native PaaS platform will accelerate the implementation of large model technology in industry applications.

　　Li Wei said that according to the survey, in addition to the role of cloud native in AI, it used to play a role in the research and development of many Internet applications, more than half of Chinese enterprises Most of the Internet applications are cloud native architecture, and even the traditional core architecture is now in the cloud.

　　According to reports, in the era of AI, because large models have a great demand for computing power, GPT3.5 is 175 billion parameters, which requires 500 NVIDIA cards, and GPT5 is 10 trillion parameters, which requires 500,000 NVIDIA cards. In this case, the cloud will play a new and critical role.

　　"It is impossible to complete the calculation of 500,000 NVIDIA cards in one data center, so it is inevitable that many large models will be computed across domains, but will the other party be an NVIDIA card after cross-domain? Or the underlying infrastructure of intelligent computing is not necessarily. What do we need in this case? Who can deploy applications on computing power? ”

　　Li Wei gave the answer, she believes that it is the cloud, and several development bottlenecks in the AI era are basically met by cloud native. Cloud native shields the difference in the underlying computing power, and its application doesn't care whether you have a CPU or GPU underneath, or what kind of card you use, I just deploy the application on it and use your computing power, so cloud native plays such a role.

　　"Many enterprises have used cloud native to manage thousands of servers in a unified manner, so as to improve efficiency and reduce costs in an all-round way." Li Wei emphasized that the combination of cloud and AI can fully reduce the engineering cost of AI, so that AI large models can be truly run and become services. (ENDS)

View original image 46K