laitimes

My parents asked me if the GPU was for gaming, and I said it was for training AI

author:China Science Expo

Eight years ago, in the summer after the college entrance examination, as a "computer enthusiast" in my class, I was often asked by my classmates what new computers to get to college. At that time, I told everyone that there are two important things in the computer, one is called the CPU, which determines the speed of the computer's response; The other is called the GPU, which is responsible for the computing of graphics, and it depends on it to play computer games smoothly.

My parents asked me if the GPU was for gaming, and I said it was for training AI

GPUs for desktop computers and GPUs for data center computers (Image source: Amazon)

At that time, GPUs were used to play games. However, eight years later, with the advent of large language models (LLMs) such as ChatGPT, GPUs capable of handling AI high-performance computing have suddenly attracted major tech companies to snap up. Nvidia, a self-proclaimed "leader in AI computing," saw its share price soar within a year, becoming the world's most valuable company on June 19. If students who have just finished the college entrance examination want to buy a computer with a GPU, the reason for applying for "Mom and Dad Wheel Angel Investment" has also become "I want to learn artificial intelligence".

My parents asked me if the GPU was for gaming, and I said it was for training AI

GPU maker Nvidia (Image source: Nvidia)

In 8 years, how has GPUs evolved from "game processing units" to "AI processing platforms"? With this question, let's talk to you today about what GPUs are all about.

What can a GPU do, can a CPU do? Why do we need GPUs

Compared with GPUs, you may be more familiar with the CPU of a computer, which is called the Central Processing Unit (CNU). The CPU is the "brain" of the computer, which controls other parts of the computer and cooperates to complete all "computing tasks" such as web browsing, game rendering, video playback, etc. The CPU determines the running speed of the computer, and there is a saying on the Internet that "my CPU is burned dry", which is to use the CPU working temperature to be too high and out of order, which is a metaphor for too many things in the brain and too complicated, and the brain can't turn.

My parents asked me if the GPU was for gaming, and I said it was for training AI
My parents asked me if the GPU was for gaming, and I said it was for training AI

On the left is the CPU is the "brain" of the computer (Image source: veer); On the right is the Intel i9-12900KS CPU (Source: hothardware)

And the GPU is also one of the components that is dominated and controlled by the CPU. GPU stands for Graphic Processing Unit (Graphic Processing Unit), and its main function is to complete graphics processing related tasks.

My parents asked me if the GPU was for gaming, and I said it was for training AI

英伟达Geforce 6600 GT GPU(图片来源:wiki)

My parents asked me if the GPU was for gaming, and I said it was for training AI

CPU and GPU installed in a desktop case, this is a concept image of a case with RTX40 series GPUs on the NVIDIA website (Image source: NVIDIA)

Theoretically, all computing tasks that can be done by the GPU can be done by the CPU. A computer can still boot normally without a discrete GPU, but if there is no CPU, "the concubine can't". In that case, why would we have to spend a fortune on a GPU? Because, just as we can't talk about toxicity in terms of dose, we can't talk about "function" in terms of "performance" in computers.

My parents asked me if the GPU was for gaming, and I said it was for training AI

CPUs and GPUs each have their own computing tasks (Image source: IBE)

For computing tasks such as game rendering and artificial intelligence, the CPU is not good at it, and although it can be completed at the functional level, the performance is very limited. When it comes to game rendering, the limitations of performance are reflected in lower frame rates, rougher image quality, and perhaps harmlessly. However, in the application of artificial intelligence, taking the large language model GPT as an example, if only the CPU is used, its training time will be as long as hundreds of years, which obviously cannot meet our desire for artificial intelligence technology breakthroughs.

And GPUs happen to excel at these "hot and difficult" tasks. In a computer system equipped with a GPU, the CPU no longer does everything alone, but offloads these tasks that it is not good at to the GPU to speed up. Thanks to the special parallel architecture design, the GPU is able to handle these tasks with ease, and can easily provide us with detailed 3D image quality, allowing OpenAI to innovate every few months.

How does GPU acceleration? The difference between GPU and CPU from the perspective of parallel architecture

Before we begin, let's draw an analogy: a program in a computer can be compared to a test paper made up of a series of problems, which is how Alan Turing, the grandfather of computer scientists, conceived computer programs. Of course, these problems are difficult and easy, from four operations that elementary school students can easily handle, trigonometric functions that high school students can play, and calculus that can only be fully mastered by college students.

Computer scientists once hoped that the CPU could be like an experienced "mathematician", using his computing power (calculation instructions) and keen decision-making power (control instructions) to quickly complete the test paper (program) and get the results that the user wants. As the computer program became more and more complex, the ability of the "old mathematician" gradually could not meet the requirements, and his main disadvantage was that even if he could calculate each problem quickly, he could only calculate one problem at any time.

My parents asked me if the GPU was for gaming, and I said it was for training AI

The working framework of a single-core CPU computer, with the black line being the data stream and the red line being the control stream, both processed by a single CPU (Image source: wiki)

As a result, companies such as Intel, IBM, and AMD began to realize that they could "hire" multiple mathematicians within a single CPU, which constitutes the current common multi-core CPU. Of course, limited by the constraints of chip heat dissipation and yield, each of the many mathematicians is not as strong as before, and may be more like a number of "college students" who are preparing to take the advanced mathematics exam. INTERVIEWER Why are you going to participate? Because I forgot about it after the exam (prickly heart).

My parents asked me if the GPU was for gaming, and I said it was for training AI

The fourth-generation AMD EPYC processor architecture, EPYC is AMD's high-performance server processor family, and the Z4 in the picture is what we call the "college student" core, which can contain 16-96 cores in a CPU (Image source: AMD official website)

GPUs can be compared to large computing teams composed of about a few thousand to tens of thousands of "elementary school" cores. Compared with the core of college students, a single primary school core can only calculate simpler operations, and the calculation speed is only 1/4~1/3 of that of college students. The figure below shows the architecture of the NVIDIA H100 GPU, with a total of 18,432 of the green grids in the "primary school" core. The core of more than 10,000 primary school students is first divided into hundreds of "classes", and then dynamically divided into "duty groups" of 32-64 people within the class.

My parents asked me if the GPU was for gaming, and I said it was for training AI

Nvidia H100 GPU architecture diagram, the GPU contains many small cores, organized according to a hierarchical structure (Image source: NVIDIA H100 architecture white paper)

How do you get so many elementary school students to collaborate effectively? Having each elementary school student complete the test paper independently like a CPU is obviously not a sane solution. In order to solve the problem of collaboration, NVIDIA GPUs have added a rule limit: all cores (threads) in each "duty group" can only perform one operation (instruction) at the same time, and one or more duty groups (which can span classes) can be organized to complete a program test paper together, this collaborative work mode is called single instruction, multiple threads (SIMT). It is the core mode of Nvidia's GPU operation. Let's use a diagram to compare the parallel operation of the CPU and GPU.

My parents asked me if the GPU was for gaming, and I said it was for training AI

Schematic diagram of the difference between multi-core and SIMT parallel modes (author's own version)

If you're smart, you're already aware that this wonderful way of organizing can make a noticeable difference between what the CPU and GPU can do. If the test paper of the CPU (single core) looks like this:

My parents asked me if the GPU was for gaming, and I said it was for training AI

Then the GPU paper would look like this (assuming a duty group has three cores):

My parents asked me if the GPU was for gaming, and I said it was for training AI

That's weird too! How could anyone produce such a test paper? The fact is: graphics rendering and AI applications fit just that. For graphics rendering, the input 3D graphics are composed of many discrete point coordinates, and the same position conversion and lighting operations need to be performed on multiple points in parallel during the rendering process. In artificial intelligence applications, the images and words input in the real world are expressed into large-scale matrices and vectors, and the operation between the matrix and vector can also be abstracted into multiple values to perform the same operation in parallel, which is the heart of SIMT parallelism!

Since the parallel mode of the GPU is so good, why don't we just design the CPU to be like this? The answer is: SIMT parallelism loses a certain amount of "versatility" compared to multi-core parallelism. In our analogy, it's more intuitive to put it simply: we don't necessarily have to organize a test paper to fit the GPU. One example is that we visit Internet applications such as WeChat and Taobao every day, because each user is sending different messages and viewing different products, it is difficult to make the GPU duty team collaborate efficiently.

How to Ride the "East Wind" of AI: From GPU to GPGPU

In 1999, NVIDIA released the world's first GPU, the GeForce 256, which integrated hardware acceleration of graphics computing such as transformation, cropping, and rendering, and was named the graphics processing unit.

My parents asked me if the GPU was for gaming, and I said it was for training AI

GeForce 256 GPU(图片来源:wiki)

In the early days, GPUs were very programmable, and most users could only call fixed programming interfaces for fixed graphical operations. However, some forward-looking computer scientists have realized the potential of parallel computing that GPUs can exploit, and they have tried to convert the mapping of scientific computing problems such as ocean current simulation and atmosphere simulation into graphics operations that GPUs can support, and have reaped performance gains.

My parents asked me if the GPU was for gaming, and I said it was for training AI

GPU board with GeForce 256 chip (Image source: VGA Museum)

也许是受到了这种“歪打正着”的应用场景启发,2007年英伟达推出了CUDA(Compute Unified Device Architecture)编程框架,向开发者全面放开了GPU的可编程能力。 借助CUDA,用户可以用类似C/C++的编程方式编写适用于GPU的并行程序。 从此,GPU的名字变成了GPGPU(General-purpose computing on graphics processing units),多出来的GP表达了对通用性的支持。

In 2006, a little before CUDA was launched, artificial intelligence titan Hinton and his team improved the training method of deep neural networks, and deep learning methods belonging to the "connectionist school" began to pick up. In 2012, AlexNet, an image classification model, won the ImageNet competition, which sparked enthusiasm for deep learning research, and the method of training with the help of GPGPU began to be widely adopted. In 2014, NVIDIA introduced the cuDNN deep learning acceleration library to make GPU-based deep learning training easier.

In 2016, Google's AI-powered Go software AlphaGo defeated South Korean player Lee Sedol, and in an early version of AlphaGo, Google used 176 GPUs to power it.

My parents asked me if the GPU was for gaming, and I said it was for training AI

Trained on hundreds of GPUs, AlphaGo defeated South Korean player Lee Sedol (Image: Deepmind)

At the end of 2022, the chatbot ChatGPT made a stunning debut, triggering a new round of longing for artificial intelligence technology. ChatGPT is a large language model that can understand and generate human language through machine learning based on the training of massive text data. Despite the lack of public data, researchers in the field generally believe that OpenAI used thousands to tens of thousands of top-of-the-line NVIDIA A100 GPUs to support training.

My parents asked me if the GPU was for gaming, and I said it was for training AI

The chatbot ChatGPT has set off a new round of research on artificial intelligence (Image source: pexel)

In this context, GPUs that support the computing power demand of large models are becoming a "hot commodity" for major technology companies. As a manufacturer with a monopoly position, NVIDIA has successfully seized the opportunity in this artificial intelligence frenzy with more than ten years of technology accumulation and completed the gorgeous transformation from "game processor" to "artificial intelligence computing platform".

Nvidia has a global monopoly in the GPU field, but due to export restrictions by the U.S. government, high-end GPU models are banned from sale to the mainland. The development and design of an artificial intelligence computing platform with independent intellectual property rights is one of the key issues for computer researchers in mainland China. So, if you have just finished the college entrance examination and "want to learn artificial intelligence", are you willing to participate in this journey to open the future?

Author: Gao Ruihao

Institute of Computing Technology, Chinese Academy of Sciences

Read on