laitimes

Huawei's AI-ready data infrastructure accelerates operators in building a closed-loop AI business

author:Observe at hand

With the continuous evolution of artificial intelligence technology, the pace of industrial digitalization is accelerating day by day, and the leap from "single-point breakthrough" to "ubiquitous intelligence" has become an irreversible trend. At the same time, when chasing large models has become an industry standard action, the new intelligent computing center for large models has become the focus of attention in the new era. As the back-end of the entire network cycle, the AI era is also redefining data storage.

Huawei's AI-ready data infrastructure accelerates operators in building a closed-loop AI business

Recently, the 2024 Mobile World Congress (MWCS) was officially held. During this period, I had an in-depth conversation with Xie Qiangqiang, Vice President of Huawei's Data Storage Product Line. Xie Qiangqiang's profound and unique insights not only revealed the core position of data storage technology in today's digital, networked, and intelligent wave, but also shared how Huawei's data storage can promote the progress of the entire industry through continuous innovation.

Unleash the value of data, and the intelligent computing center welcomes the transformation of large models

At present, with the release of a series of large models represented by ChatGPT, Sora, Gemini, etc., it has not only detonated the global science and technology circle, but also consolidated the strategic position of artificial intelligence in changing human production and lifestyle in the future, triggering a generational leap in social civilization and competitiveness.

Since 2017, various ministries and local governments have successively issued policies to guide the development plan of the AI industry, encourage enterprises to increase talent introduction and R&D efforts, and clearly point out that they should actively promote the orderly development of intelligent computing centers.

Different from traditional cloud data centers and supercomputing centers, the intelligent computing center is a new type of data center with intelligent computing power such as GPUs and AI accelerator cards as the core and intensive construction.

Among them, China Mobile began its strategic layout in the field of artificial intelligence in 2013, and its Jiutian platform has AI capabilities in computer vision, natural language processing, intelligent voice, network intelligence and other fields, and has become one of the "AI national teams" of central enterprises; Based on the advantages of cloud-network integration and the technical precipitation of e Cloud for many years, China Telecom has developed its own Xingchen AI large model, built a complete basic framework of semantics, speech, vision and multi-modal large models, and fully open-sourced this year. China Unicom also released the Yuanjing model, which has realized the functions of personalized customization of force clothing, product quality testing, and production safety management in the industrial field alone.

While rapidly deploying large models, operators have found that the intelligent computing centers built in the early stage are mainly aimed at carrying small and medium-sized models and enabling enterprises to transform digitally and intelligently, but they still need to be improved in terms of technical standards, ecosystem construction, business development, and overall operations.

At the same time, from the perspective of operators, when operators are ready to embrace the AI era, they often face two major problems. The former is the fragmentation of data within customers, and high-value data is often scattered in various departments, lacking unified integration and management, resulting in data fragmentation and affecting the maximization of data value. The latter is how to ensure that valuable data can flow quickly and securely in the process of capacity spillover for more efficient business operations and customer service.

Therefore, for the development of AI large models, Xie Qiangqiang proposed that there is no AI without data, and the scale and quality of data determine the height of AI intelligence, and only by revitalizing data assets and accelerating the whole process of AI can we give full play to the value of data.

AI-Ready Data Infrastructure Reference Architecture, from "Heap Computing Power" to "Mining Potential"

In August last year, China Mobile clearly stated in the "NICC New Intelligent Computing Center Technology System White Paper" that the technical architecture of the NICC new intelligent computing center needs to be systematically reconstructed in the five major fields of interconnection, computing efficiency, storage, platform and energy saving, so as to support the innovation and transformation of large models for thousands of industries, and through the detailed elaboration of "new storage - mining data value", China Mobile showed the key challenges faced by intelligent computing scenario storage in terms of performance, capacity and scheduling. This paper proposes solutions such as multi-protocol converged storage to connect heterogeneous data, global unified storage to break the limitations of monomer, and unified memory pool based on computing bus.

In fact, data infrastructure is the "granary" of large models, providing data nourishment for large models, and without sufficient and high-quality data, the learning ability of large models will be greatly reduced. Secondly, the perfection of data infrastructure directly affects the training speed and availability of large models, which in turn affects the development speed of large models in various fields.

At present, the scale of operators' central AI clusters has entered the era of 10,000 and 100,000 cards, and the penetration of edge AI models into the industry has also accelerated.

To enable intelligent industry upgrades and innovations, Huawei proposes an AI-ready data infrastructure reference architecture and solutions to help operators build secure, reliable, and open AI data infrastructures.

According to Xie Qiangqiang, the current AI-ready data infrastructure reference architecture has been implemented in two major scenarios.

The first is the intelligent computing center scenario of central training. To address the three major problems of nonlinearity, data silos, and the surge in the amount of AI large model data in Vanka clusters, Huawei proposes the AI Data Lake Solution, which uses technologies such as unified namespace, intelligent data classification, numerical control separation, and endogenous security to ensure strong consistent access to data, improve cluster utilization, reduce GPU latency, and significantly improve the overall availability and computing efficiency of AI clusters.

The second is the edge training and pushing ToB scenario. Industries such as manufacturing, healthcare, and education have abundant data and application scenarios that are suitable for the in-depth application of large AI models. on this. Huawei's transformation solution gradually focuses on RAG, supplemented by model fine-tuning to overcome issues such as timeliness, inference accuracy, and interactivity, and works with operators to provide a full range of edge training and pushing solutions, including installation, data processing, model fine-tuning, application development, and O&M optimization, to remove obstacles to the implementation of large models in enterprise applications.

From the perspective of technical architecture, AI-Ready infrastructure is the core support, covering large-scale computing, efficient storage, and lossless network. However, in the process of large model training, data moves frequently in computing, storage, and network, which increases system overhead and reduces the overall efficiency of the AI cluster system. The parameters and data scale of large models are growing exponentially, which puts forward higher requirements for storage scalability, stability, performance, and latency.

The reference architecture of Huawei's AI-ready data infrastructure has transformed the development of large AI models from "heaping computing power" to "tapping potential".

In the past, in order to improve the performance of AI, operators were constantly piling up more computing resources, which was not only costly, but also consumed a lot of energy, which was not conducive to the sustainable development of AI technology. In fact, AI clusters are like a "gold swallower" of cost and energy consumption, and with the increase of computing power, the required cost and energy consumption also show an alarming growth trend.

In the case of GPT3, for example, the electricity consumption of a single training session is equivalent to 500 tons of CO2 emissions, which is roughly equivalent to the electricity consumption of 300 households in a year. What's even more amazing is that Sora's single training session consumes 1,000 times more than GPT3. This amount of energy consumption is not only staggering, but also makes us wonder whether we are also invisibly exacerbating environmental pressures while pursuing the progress of AI technology, and the prominence of energy problems has become an important factor restricting its further development.

Huawei's AI-Ready data infrastructure reference architecture significantly improves the availability of AI clusters by rationally configuring the performance of storage clusters and selecting high-performance and reliable storage solutions, reducing AI development and operation costs, and expanding AI applications in the edge field, accelerating the realization of an AI business closed loop for operators.

From behind the scenes to the front of the stage, it helps operators build a new business closed loop

The digital economy is the main economic form after the agricultural economy and the industrial economy, and the current development of the digital economy is unprecedentedly fast, wide-ranging, and unprecedentedly influential, which is promoting profound changes in the mode of production, lifestyle, and governance, and has become a key force in reorganizing global factor resources, reshaping the global economic structure, and changing the global competition pattern.

As an important participant in the digital economy, carriers are actively following up on the development opportunities of artificial intelligence, giving full play to the role of data element multiplier, and accelerating the implementation of technology research and development and application in the field of artificial intelligence and big data, not only building a "second curve" for their own high-quality development, but also providing a digital "new engine" for strategic emerging industries and future industries.

In fact, as the core production factor of the digital economy, it is necessary to ensure the security, integrity and availability of data, especially for operators, who not only have massive user data, but also an important bridge connecting users and service providers.

Through continuous innovation, Huawei has achieved 100% autonomy and control in the storage industry, with product lines covering various types such as SAN storage, NAS storage, unified (hybrid) storage, all-flash storage, and distributed storage.

In fact, data storage is located at the back end of the entire network cycle, which makes its role in the overall AI infrastructure relatively hidden, and it is often difficult for the industry chain to intuitively perceive its existence and importance.

Today, the AI era has made data a core element in driving the development of the industry. In a way, without data, there is no way to talk about AI. The storage industry should not only strive to break down data silos and promote data sharing and circulation, but also provide strong support for the development of the AI era by optimizing data storage, improving data quality, strengthening data security and promoting data flow.

Relevant data show that the scale of the global data storage industry has approached 300 billion US dollars, and the future market space is huge, and it will continue to maintain a rapid growth trend. According to the prediction of third-party consulting institutions such as IDC and Gartner, the scale of the storage industry in mainland China will exceed 260 billion yuan in the upstream industry chain by 2025, and the middle and downstream will exceed 800 billion yuan, and the total direct investment in the storage industry in mainland China will exceed one trillion yuan.

As the cornerstone of information technology, the storage industry not only meets the requirements of the development of new quality productivity, but also combines a new generation of information technology to give birth to new models and new kinetic energy of other industries.

Read on