laitimes

Artificial Intelligence Chips: Conceptual Implications and Why They Matter

Artificial Intelligence Chips: Conceptual Implications and Why They Matter

AI will play an important role in national and international security in the foreseeable future. As a result, the United States government is considering how to control the spread of AI-related information and technologies. Due to the difficulty of effectively controlling general AI software, datasets, and algorithms, the computer hardware required for modern intelligent systems has naturally become a focus. Leading-edge, specialized "AI chips" are essential for cost-effective and efficient large-scale application of AI. In response, United States Georgetown University's Center for Security and Emerging Technology (CSET) released a report, "AI Chips: Concepts and Why They Matter," which focuses on what AI chips are, why they are indispensable for large-scale development and deployment, and analyzes the impact of AI on national competitiveness.

First, the development of the industry favors artificial intelligence chips instead

Universal chips

(1) The law of chip innovation

The development of all computer chips, including general-purpose central processing units (CPUs) and specialized chips (such as artificial intelligence chips), has benefited from smaller transistors, which run faster and consume less power than larger transistors. However, at least in the first decade of the 21st century, despite the rapid shrinkage of transistors and the huge speed and efficiency gains they bring, the design value of specialized chips remained low, with general-purpose CPUs dominating.

As the technology for shrinking transistors continues to evolve, the density of transistors in chips continues to increase. In the 60s of the 20th century, Moore's Law stated that the number of transistors in a chip doubled approximately every two years. Following this law, CPU speed has been greatly improved. The increase in transistor density is mainly achieved by "frequency scaling", i.e., the transistor switches faster between the ON (1) and OFF (0) states, allowing a given execution unit to perform more calculations per second. In addition, the reduction in transistor size reduces the power consumption of each transistor, resulting in a significant increase in chip efficiency.

As transistors shrink and density increases, new chip designs are possible, and new chips are running more efficiently and faster. CPUs can integrate more different types of execution units that can be optimized for different functions. At the same time, more on-chip memory reduces the need for off-chip memory, resulting in faster access. In addition, CPUs can provide more headroom for architectures that implement parallel rather than serial computing. On a related note, if the increase in transistor density makes the CPU smaller, a single device can accommodate multiple CPUs and run different calculations at the same time.

(2) The slowdown of Moore's Law and the decline of general-purpose chips

As transistors shrink to just a few atoms, their size is rapidly approaching the absolute lower limit, and various physical problems at small sizes make further shrinking transistor sizes more technically challenging. This has led to an unsustainable increase in capital expenditures and talent costs in the semiconductor industry, with the introduction of new chip process technology nodes slower than in the past. As a result, Moore's Law is slowing down, that is, it is taking longer and longer for transistor density to double.

In an era of general-purpose chips, the cost can be spread across the millions of chips sold. Dedicated chips, while improving for specific tasks, could not rely on enough sales to compensate for the high design costs, and their computing advantages were quickly erased by the next generation of CPUs. Today, the slowdown of Moore's Law means that CPUs are no longer improving rapidly, and the economies of scale of general-purpose chips are being undermined. At the same time, on the one hand, key improvements in semiconductor capabilities have shifted from manufacturing-driven to design-and-software-driven; On the other hand, there is a growing demand for AI applications that rely on dedicated chips for highly parallelizable and predictable computing.

These factors are driving the development of chips in the direction of artificial intelligence specialization, prompting artificial intelligence chips to seize the market share of CPUs.

Second, the main characteristics of artificial intelligence chips

An AI chip is a common type of specialized chip that has some common features. One is that AI chips can perform more calculations in parallel than CPUs; Second, it is possible to successfully implement AI algorithms in a low-precision computing mode, but at the same time reduce the number of transistors required for the same calculation; the third is to accelerate memory access by storing the entire algorithm in a single AI chip; The fourth is to use specialized programming languages to efficiently translate AI computer code for execution on AI chips. It should be clarified that AI chips are specific types of computer chips that enable AI computing to be implemented efficiently and at high speeds, at the cost of running only with lower efficiency and speed in other general-purpose computing.

There are three types of AI chips: graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs). GPUs were originally used for image processing. In 2012, GPUs began to be increasingly used to train AI systems, and this application has dominated since 2017. GPUs are also sometimes used for inference. However, even though GPUs can provide a higher degree of parallelism than CPUs, they are still designed for general-purpose computing. Dedicated FPGAs and ASICs are more efficient than GPUs, and their applications for inference are becoming more prominent, and ASICs are increasingly being used for training. FPGAs consist of many logic blocks (i.e., modules containing a set of transistors) and the interconnects between the logic blocks can be reconfigured by programmers to fit specific algorithms after the chip is manufactured, while ASICs include hard-wired circuitry tailored to specific algorithms. Leading-edge ASICs typically offer higher efficiency than FPGAs, which are more customizable than ASICs and can facilitate design optimization as algorithms evolve. In contrast, ASICs can only become more and more obsolete as algorithms iterate.

Machine learning is an important method to implement artificial intelligence, which mainly involves training and inference. To put it simply, training is the stage of searching for and solving the optimal parameters of the model. When the model parameters have been solved, the model is used and deployed, and it is called inference. Considering that each task in training and inference has different chip requirements, the two may use different AI chips. First of all, training and inference require different forms of data parallelism and model parallelism, and on top of some of the same computational steps, training also requires some additional computational steps. Second, training actually always benefits from data parallelism, but this is not the case with inference, such as sometimes only one inference may be required for a single block of data. Finally, the relative importance of efficiency and speed for training and inference may vary depending on the use case.

The commercialization of AI chips depends on the degree to which they are versatile. GPUs have long been widely commercialized, while FPGAs are less commercialized. At the same time, ASIC design costs are high, and specialization features lead to low sales volume, which is relatively difficult to commercialize. However, the projected market size growth for AI chips is likely to create the necessary economies of scale to make narrower ASICs profitable.

Artificial intelligence chips can be divided into different grades according to different performance. In the high-performance field, server-grade AI chips are typically used in high-performance data centers and are larger than other AI chips when packaged. The mid-performance chip is a personal computer artificial intelligence chip commonly used by consumers. In the low-performance domain, mobile AI chips are often used for inference and are integrated into a chip system that also contains a CPU.

3. Why AI needs cutting-edge artificiality

Smart chips

AI chips are typically 10-1000 times more efficient and faster than CPUs. An AI chip that is 1,000 times more efficient than a CPU provides improvements equivalent to those driven by Moore's Law for 26 years.

(1) Analyze whether the use of cutting-edge artificial intelligence chips is effective from the perspective of cost and benefit

Cutting-edge AI systems require not only AI chips, but also state-of-the-art AI chips. Ordinary chips are larger, slower, and consume more energy, resulting in the cost of power consumption in the process of AI model training will quickly swell to an unbearable level.

Two main conclusions can be drawn by comparing the cost of cutting-edge AI chips (7nm or 5nm) with ordinary chips (90nm or 65nm). In terms of production and operating costs, the use of cutting-edge AI chips will save more economic costs than ordinary chips. Because the electricity cost of ordinary chips will be 3-4 times the cost of the chip itself after 2 years of use, and it will increase year by year over time. And the cost of electricity for cutting-edge AI chips just exceeds the cost of the chip itself. Second, it is estimated that it will take 8.8 years to produce and run a 5nm chip to match the cost of 7nm. So, under 8.8 years, 7nm chips are cheaper, while above 8.8 years, it is cheaper to use 5nm chips. Therefore, users will only have an incentive to replace their existing 5nm node chips if they expect to use the 5nm node chip for 8.8 years.

Typically, companies will replace server-grade chips after about three years of operation, but if they buy 5nm chips, they may expect to use them for a longer period of time, so the slowdown in market demand also matches the law that Moore's Law is slowing down. As a result, it is predicted that 3nm chips may not be launched for a long time.

(2) The cost and speed of chips are the bottlenecks of compute-intensive AI algorithms

The time and money spent by businesses on AI-related computing has become a bottleneck for this technological advancement. Given that cutting-edge AI chips are more cost-effective and faster than older chips or cutting-edge CPUs, AI businesses or labs need such chips to continue to push the boundaries of smart technology.

First, DeepMind has developed a range of leading AI applications (such as AlphaGo), some of which cost as much as $100 million to train. OpenAI reported that its total cost in 2017 was $28 million, of which $8 million was spent on cloud computing. If computing is run with older AI chips or cutting-edge CPUs, the cost of computation is multiplied by 30 or more, which will make such AI training or experimentation economically prohibitive. Computing costs are growing so fast that they may soon reach their upper limits, so the most efficient AI chips are needed.

Second, leading AI experiments can take days or even a month to train, while deployed critical AI systems often require fast or real-time inference. The use of older AI chips or cutting-edge CPUs will dramatically increase these times, making the iteration required for AI R&D and the inference of the critical AI systems deployed unacceptably slow.

One limitation of the above analysis is that some of the recent AI bursts

Read on