laitimes

Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation

author:Think Tank of the Future

(Report Producer/Author: Guohai Securities, Liu Xi)

1. Overview of chip heat dissipation

Origin of chip heat dissipation: The essence of heat generation in electronic equipment is to convert working energy into heat energy

The essential reason for the heat generation of electronic devices is the process by which working energy is converted into heat energy. As the core component of electronic equipment, the basic working principle of the chip is to convert electrical signals into various functional signals to achieve data processing, storage and transmission. In the process of completing these functions, the chip generates a lot of heat, because the transmission of electronic signals will be accompanied by energy losses such as resistance, capacitance, and inductance, which will be converted into heat energy.

Too high a temperature can affect the working performance of electronic equipment and even lead to damage to electronic equipment. According to the "Research Status and Development Prospects of Electronic Chip Heat Dissipation Technology", for example, for stable and continuous electronic chips, the maximum temperature should not exceed 85 °C, and too high temperature will cause damage to the chip.

Thermal technology needs to be continuously upgraded to control the operating temperature of electronic devices. The continuous development of chip performance has increased the power consumption of chips and put forward higher requirements for heat dissipation technology. In addition, the training and inference requirements of large AI models require the single-card computing power of AI chips to be improved, which is expected to further open up the growth space of advanced heat dissipation technology.

Principle of heat dissipation technology: The essence of heat generation of electronic equipment is to convert working energy into heat energy

Heat dissipation is designed to solve thermal management problems in high-performance computing devices, and they optimize device performance and extend lifetime by removing heat directly on the surface of the chip or processor. With the increase of chip power consumption, it has developed from the linear temperature uniformity of one-dimensional heat pipes, to the plane uniform temperature of two-dimensional VC, to the three-dimensional integrated temperature uniformity, that is, the path of 3D VC technology, and finally to liquid cooling technology.

Chip heat dissipation innovation: the immersion type has a good heat dissipation effect, and the cold plate type is more mature

According to the ODCC White Paper on Cold Plate Liquid Cooled Server Design, considering factors such as initial investment cost, maintainability, PUE effect, and industry maturity, cold plate and single-phase immersion have more advantages than other liquid cooling technologies and are the mainstream solutions in the industry.

Second, the main heat dissipation technology

Heat pipe: high-efficiency heat transfer device, suitable for high-power and small-space scenarios

A heat pipe, also known as a heat pipe, is a highly efficient heat transfer device. It is able to quickly transfer heat from one end to the other through the phase change process of the internal working fluid, and its structure is simple, consisting of a closed container, a capillary structure, and a working fluid. Heat pipes are characterized by high thermal conductivity, temperature uniformity and isothermality. It is used for high-power chips and products with small heat dissipation space, such as notebooks, servers, game consoles, VR/AR, communication equipment, etc.

VC: Higher thermal efficiency and flexibility compared to heat pipes

VC vapor chamber, the full name of Vapor Chamber, that is, vacuum chamber vapor chamber heat dissipation technology, is a more advanced and efficient thermal conductive element than heat pipes, especially when dealing with thermal management problems in high-density electronic devices. Compared with heat pipes, VC is more thermally efficient and flexible. The thermal conductivity of copper is 401W/m.k, heat pipes can reach 5000~8000 W/m‧k, and vapor chambers can reach 20000~10000W/m‧k, or even higher. A heat pipe is a one-dimensional heat conduction, which is shown by its shape. The shape of the vapor chamber is not limited, and any shape can be designed according to the layout of the chip, and it can even be compatible with the heat dissipation of multiple heat sources at different heights.

Computer room air conditioning: Water-cooled air conditioning has a better cooling effect than air-cooled systems

Air-cooled direct expansion system: It is an air conditioning system, which is mainly used for cooling and heating in small and medium-sized buildings or individual rooms. The refrigerant is generally Freon, with a single-machine cooling capacity of 10-120KW. Water-cooled chilled water system: A central air-conditioning system that transfers heat by using water as a cooling medium. This system is generally composed of chillers, cooling towers, pumps and pipes, etc., and is widely used in large buildings.

Liquid cooling: cold plate type and immersion type liquid cooling are the mainstay

Server liquid cooling is divided into direct cooling and indirect cooling, with direct cooling being mainly immersed and indirect cooling being cold plate being the main type being cold plate cooling. The coolant of cold plate liquid cooling is not in direct contact with the server components, but is used for heat exchange through the cold plate, so it is called indirect liquid cooling. According to whether the coolant has a phase change in the cold plate, it is divided into single-phase cold plate liquid cooling and two-phase cold plate liquid cooling. Immersion liquid cooling is the cooling method in which an entire server or its components are immersed directly in a liquid coolant.

Cold plate liquid cooling: The server needs to be transformed, and the penetration rate is gradually increasing

For example, Inspur Information is based on the 2U four-node high-density computing server i24, adding multiple cold plates to contact with CPU, I/0, memory and other heating units, and also setting up multiple pipelines to communicate with the cold plate inside and connect the cabinet level divergent pipes outside, so that about 95% of the heat in the system is directly taken away by the liquid through the cold plate contact heat source, and the remaining 5% of the heat is taken away by the cooling water in the air-liquid heat exchanger behind the PSU power supply.

The cold plate liquid cooled server transforms the original server structure, taking into account the responsibilities, assembly methods and other factors, the main players think that the original server manufacturer; Server manufacturers purchase raw materials such as cold plates and pipes, and then assemble them themselves for production and processing. The average price of cold-plate liquid-cooled servers may be higher than that of air-cooled servers, and with the increase in its penetration rate, server manufacturers are expected to achieve both volume and price growth and profitability.

Immersion liquid cooling: Liquid immersion server as a whole, high technical requirements

Immersion liquid cooling is the cooling method in which an entire server or its components are immersed directly in a liquid coolant. The liquid completely surrounds the server elements, allowing for more efficient heat absorption and dissipation. According to whether the phase change occurs in the process of heat dissipation of the engineering liquid, it can be divided into single-phase immersion liquid cooling and two-phase immersion liquid cooling. The immersion liquid-cooled server has carried out multiple transformation designs such as shell design, motherboard transformation, heat dissipation system upgrade, and airtightness of the server, which has high technical requirements and is mainly produced by server manufacturers.

3. Market space

Driver 1: Chip protection security, temperature control is conducive to the extreme performance of the chip

Too high a chip temperature can affect the working performance of the device and even cause damage to the electronic device. According to "Cabont e ch Maga z ine", when the temperature of the electronic device is too high, the working performance will be greatly attenuated, when the working temperature of the chip is close to 70-80 °C, the temperature rises by 10 °C, the performance of the chip will be reduced by about 50%, and more than 55% of the failure forms of electronic equipment are caused by excessive temperature. We believe that with the development of large AI models and the improvement of chip performance, the power consumption and operating temperature of chips are increasing, which may affect the efficiency of processors and other products. This puts forward higher requirements for chip-level heat dissipation and other technologies, and chip-level heat dissipation is expected to open up growth space and achieve both volume and price increases.

Driver 2: AI large model development + chip performance growth, chip power consumption continues to increase

The power consumption of CPU and GPU chips in servers is relatively high. According to the "Research Progress on Data Center Server Power Consumption Model", the power consumption of CPU, memory, storage and other devices in general-purpose servers accounts for 32%, 14%, and 5%. For example, the power consumption of NVIDIA H100GPU is as high as 700W, and the maximum power consumption of DGX H100 server is 10.2kW, and the GPU power consumption is expected to account for about 55% of the total power consumption of the server. The power consumption of chips continues to increase: for example, the power consumption of the Ic e Lake CPU is up to 270W, and the Granit e Rapids CPU is expected to be launched in 2024. In 2024, Nvidia will launch the B200 GPU, which will consume 1000W. In the future, with the improvement of chip performance and the gradual development of AI large models, the power consumption of chips such as CPU and GPU will continue to increase, bringing a broad demand for advanced heat dissipation devices.

Driver 3: Policies such as "dual carbon" and eastern and western computing require data center PUE to be reduced

PUE = Total Data Center Energy Consumption / IT Equipment Energy Consumption. PUE is the core indicator for evaluating the energy efficiency of data centers, and the closer the value is to 1, the higher the energy efficiency of the data center. The proportion of energy consumption of the air conditioning system in the data center is second only to that of IT equipment, and it is important to reduce the energy consumption of the air conditioning system when the IT system cannot be upgraded. When the proportion of energy consumption of the air conditioning system decreases from 38% to 18%, the PUE of the data center also decreases from 1.92 to 1.3.

Policies such as "dual carbon" and eastern and western computing require data center PUE to be reduced. According to Uptime Intiot, as of 2022, the average PUE of medium and large data centers in the world is 1.55, and according to the White Paper on the Development of China's Data Center Industry (Ningxia) (2022), the average PUE of IDCs in China in 2021 is 1.49. Under the dual policies of "dual carbon" and "eastern data and western computing", the average PUE of new large-scale and ultra-large data centers in the country has been reduced to below 1.3, and the PUE in the cluster is required to be ≤ 1.25 in the east, 1.2 in the western ≤, and 1.15 for advanced demonstration ≤projects.

According to CDCC and Inspur information, the PUE of an air-cooled data center is generally around 1.4-1.5, while the PUE of a liquid-cooled data center can be reduced to less than 1.2 to meet relevant policy requirements. We believe that the adoption of more energy-efficient and efficient heat dissipation technology is the general trend, and liquid cooling technology may further open up room for growth.

Chip cooling market: high-end processor shipments increased + power consumption increased, driving both volume and price

With the expansion of the market size of AI chips and AI servers, and the increase in chip power consumption and the increase in heat dissipation requirements, we believe that the growth rate of the chip-level heat dissipation market is expected to increase. The AI chip and AI server market has grown rapidly, and NVIDIA's revenue has doubled year-on-year for three consecutive quarters. According to Precedence, the global AI chip market is expected to be worth $47.7 billion in 2026, with a CAGR of 29.72% from 2024 to 2026; In Q4 FY2024, NVIDIA's revenue reached $22.1 billion, +22% quarter-on-quarter and +265% year-on-year, doubling revenue for three consecutive quarters. According to Sti S ti c s, global AII server shipments are expected to reach 2 369,000 units in 2026, with an expected CAGR of 25.50% from 2024 to 2026. The power consumption capacity of AI chips has improved, and the growth rate of the heat dissipation market is expected to increase. In 2024, NVIDIA will release the B200, which uses the N4P process and packages 208 billion transistors, while the H100 transistor is 80 billion and uses the N4 process, which brings the B200 package density to increase and the power consumption reaches 1000W, which puts forward higher requirements for heat dissipation technology.

Telecom operators: It is expected that liquid cooling will reach 50% penetration rate in 2 0 2 5 years

Telecom operators may promote liquid cooling technology to gradually carry out technical verification and large-scale experiments. In 2023, the three major operators jointly released a white paper on liquid cooling technology, proposing a "three-year vision": 1) 2023: The liquid cooling industry will carry out technical verification to fully verify the performance of liquid cooling technology, reduce PUE, and reserve technical capabilities such as planning, construction and maintenance; 2) 2024: Carry out large-scale testing, and pilot the application of liquid cooling technology in 10% of new data center projects to promote the maturity of the industrial ecology. Promote the decoupling of liquid-cooled cabinets and servers, promote competition, promote the maturity of the industrial ecology, and reduce the cost of the whole life cycle; 3) 2025: Carry out large-scale application, and more than 50% of data center projects will apply liquid cooling technology, and jointly promote the formation of a high-quality development pattern with unified standards, perfect ecology, optimal cost, and large-scale application.

Excerpts from the report:

Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation
Special Report on Chip Cooling Industry Chain: From Air Cooling to Liquid Cooling, AI Drives Industry Innovation

(This article is for informational purposes only and does not represent our investment advice.) To use the information, please refer to the original report. )

Selected report source: [Future Think Tank]. Future Think Tank - Official Website

Read on