3D Scientific Computing: Revolutionizing the efficient computing architecture of traditional scientific computing

In 1983, a research team led by the famous mathematician Lax wrote a report pointing out: "Large-scale scientific computing is of special importance to national security, scientific and technological progress and economic development, and is a key sector of modern science and technology." Proceeding from the national interests of the United States, the absolute superiority of big computing cannot be shaken."

The picture comes from the Internet

What is Scientific Computing? Why was it raised to the level of US national interests in the 80s of the last century?

Scientific computing refers to the whole process of using computers to reproduce, predict and discover the motion laws and evolutionary characteristics of the objective world, including the process of establishing physical models, researching calculation methods, designing parallel algorithms, developing applications, carrying out simulation calculations and analyzing calculation results. The rise of scientific computing is considered one of the most important scientific advances of the 20th century, and the famous computational physicist and Nobel laureate Professor Kenneth Wilson pointed out as early as the 80s that computing is one of the three major scientific methods alongside theory and experiment.

The reason why scientific computing has risen to a strategic issue at the national level in the 80s of the 20th century is that it has broken through the boundaries of traditional experimental and theoretical research, greatly improved the depth of human insight into the natural world and social systems, and its role in disciplines such as life sciences, medicine, economics and other disciplines has become more and more critical, and has become the core technical support for many key industries such as weather forecasting, energy exploration, aerospace, transportation planning, manufacturing and water conservancy engineering.

Scientific computing is indispensable because, in many cases, direct experiments are either impossible (e.g., studying the long-term evolution of the Mexican Warm Current, global effects of the greenhouse effect, tornado formation mechanisms), impractical (e.g., nuclear facility safety testing, nuclear weapon effects assessment, pollutant dispersion simulations), or costly (e.g., frequent design iterations of aircraft, vehicle crash tests, structural elucidation of biological macromolecules). The intervention of scientific computing fills the gaps in these scientific research and engineering practices.

The bottleneck of scientific computing and high-performance computing needs to be broken through urgently

For scientific computing, computers are the most important hardware foundation. However, as the problems that need to be solved by scientific computing become more and more complex and closer to real models, the computing resources required are astronomical beyond the reach of ordinary computers, so they can only rely on specialized high-performance computers.

High-performance computers (HPCs) are a general term for a class of computers with extremely fast computing speed, extremely large storage capacity, and extremely high communication bandwidth, and are often referred to as supercomputers (supercomputers). Generally speaking, high-performance computers connect a large number of computer organizations into a massively parallel computing cluster through a high-speed network, and if a computer is compared to a human brain, then a high-performance computing cluster is a "super brain" with many brains connected into a matrix. Compared to ordinary computers, high-performance computing clusters can achieve unimaginable computing speeds. At present, the top of the list of the world's top 500 supercomputers is the "frontier" of supercomputers with a peak floating point operation speed of 10 billion petaflops per second, which is equivalent to the total processing power of about 1 billion current mainstream laptops working at the same time.

Such a terrifying high-performance computer must be fully enough to support the performance requirements of scientific computing for computers, right? Unfortunately, not only is it not enough, it may even be far from enough.

One of the main reasons is that existing massively parallel computing programs are too inefficient. As mentioned above, the peak speed of floating-point operations per second of the strongest supercomputer "Frontier" exceeds 10 billion times, but this performance is actually a theoretical performance peak obtained by assuming that every computing node in the "Frontier" computing cluster is unconditionally operating at full speed. Generally speaking, the actual computing performance of programs that are not optimized in supercomputing is less than 10% of the theoretical peak performance.

Therefore, how to improve the algorithm and data structure for the application, optimize the performance of the program, and improve the actual floating-point performance is one of the most concerned issues of scientific computing, but this often means high difficulty and great development cost and cycle. Known as the Nobel Prize in the supercomputing industry, the Gordon Bell Award is regularly awarded to some optimization programs that represent the highest performance of applications in related fields, and the computational efficiency of the programs that can win the award is generally only optimized to the range of 10%-50%, for example, the VPIC program used to simulate a laser plasma interaction model containing 1 trillion particles won the Gordon Bell Award in 08 with a computational efficiency of 25%. This shows how difficult it is to optimize and improve the computational efficiency of the program.

On the other hand, the urgent need for high-performance computing in scientific computing is still growing, and the limitations of existing computing architectures in handling massively parallel computing and high-bandwidth data transmission are becoming more and more prominent, calling for a new round of innovation in computing technology. Professor Tang Huazhong of the School of Mathematical Sciences of Peking University once said, "Adopting a new computing structure and giving full play to the efficiency of existing hardware equipment is a new method for the development of scientific computing." ”

3D scientific computing has revolutionized computing efficiency in the field of scientific computing

In recent years, a type of computing architecture concept called "3D scientific computing" has gradually emerged, which has shown great potential in solving the efficiency bottleneck problem of scientific computing scenarios. For example, in the field of life sciences, scientific computing is actually to simulate the interaction process between proteins and proteins/small molecules in three-dimensional space in the computer, for which it is necessary to establish the three-dimensional spatial coordinates of each atom in the computer, and calculate the position change of each atomic coordinate at each moment according to complex physical formulas. The spatial scale of the problem is undoubtedly a 3D three-dimensional space.

However, the architecture of traditional high-performance computing systems is often two-dimensional, and the server and server are linearly connected through switches or routers, and such a two-dimensional computing architecture will naturally produce a large amount of additional communication workload between nodes when dealing with three-dimensional space problems, which greatly increases the complexity of computing. If we can directly restore the connection mode between nodes in a huge computing cluster according to the real-world 3D model, we can reduce the complexity of the 3D scientific computing simulation problem we need to solve from the most basic computing communication requirements, which is the basic idea of 3D scientific computing.

In order to help you understand this abstract concept vividly, we try to build a simple approximate model of this complex problem: suppose we want to build a motion model of the methane molecule (CH4) on a computing cluster composed of 5 computers, each computer is only responsible for storing and calculating the position information of 1 atom in methane, and for each computer, in order to calculate the updated position of the atom that it is responsible for recording at the next moment, it needs to get the position information of the remaining 4 atoms. as a calculation parameter for the position update formula. We can examine the distance required to solve the above problems in 2D and 3D architectures (the distance between two adjacent computers is considered as 1 unit distance).

As shown in the figure below, if these 5 computers are connected into a long dragon using a two-dimensional linear connection mode, then the leftmost computer needs to pass through the middle 3 computers to transmit data to the rightmost computer, and in order to let all computers obtain the information of the other 4 computers, the overall unit distance of data communication is 40; If these five computers are connected in a three-dimensional way in imitation of the three-dimensional spatial structure of methane molecules, then the overall communication distance required is 32, which is 20% more efficient than the two-dimensional connection scheme when all other conditions are equal.

If it is an ethane molecule that is a little more complex than methane, then the performance optimization of the three-dimensional connection architecture is directly improved by 42% compared with the two-dimensional connection architecture, and the performance gap between the two will be widened as the problem scale continues to become more complex. A protein macromolecule in the real world is often composed of hundreds of thousands to millions of atoms, and at such a complex scale, the redundant communication requirements generated by the two-dimensional connection architecture can even reach the order of magnitude of 10 to the 12th power, in other words, the architecture of 3D scientific computing will naturally reduce the communication transmission needs of trillions of times at every turn, and has an unshakable absolute advantage in performance improvement.

It can be said that 3D scientific computing has brought about an innovation in the field of computing architecture, not only in the full use of hardware efficiency, but also in the significant improvement of communication efficiency and algorithm efficiency. The 3D layout not only shortens the physical distance, effectively shortens the data transmission path, and reduces the latency through the server distribution in the spatial dimension, but also disperses the traffic through the multi-level network design, effectively alleviating the single-point pressure and improving the smoothness of data transmission.

However, the implementation of 3D scientific computing will certainly not be so simple. As the connection between chip nodes has been completely subverted, in order to support the computing under this new architecture, the design of the chip itself and the software adapted to this system need to be redeveloped and redefined. At present, the computing system used to realize 3D scientific computing often uses ASIC-specific chips that will be specially customized, which cannot support the running of most algorithms and software as highly flexible as CPUs and GPUs, but it is precisely because ASIC chips do not need to consider versatility that 100% of the area on the circuit can be used to improve the extreme performance of specific algorithms. High-performance computers built with ASIC chips are also known as dedicated supercomputers, which are different from general-purpose supercomputers in that they can exert more than 100 times the performance of the most powerful supercomputers in specific fields, at the cost that each dedicated supercomputer needs to be specially designed for a single specific field, and can only solve the problem in that field in the end.

3D scientific computer that subverts the world's biocomputing paradigm - Itanium supercomputer (Anton)

At present, the realization of 3D scientific computing needs to rely on special supercomputing to achieve, and the difficulty and cost of its R&D and design are very huge. However, Dr. David E. Shaw, a "mad scientist" in the United States, has built a 3D scientific computing supercomputer dedicated to the field of biological computing, which has subverted the cognition and imagination of the whole world about biological computing and demonstrated to the world the terrifying research and application value that the performance improvement of the innovative architecture of 3D scientific computing can bring.

Itanium supercomputer (Anton) is a typical representative of the application of 3D scientific computing concept, which is based on the concept architecture of 3D scientific computing, allowing supercomputers to simulate the dynamic motion of proteins in three-dimensional space with unprecedented speed and accuracy, and giving new life to scientific computing research in the field of biology.

Itanium Supercomputer | Image source: Internet

Specifically, before the advent of Itanium supercomputing, even if the world's most powerful supercomputer was used to simulate the movement of biological macromolecules, only tens of nanoseconds of motion data could be calculated in a day, but even the most basic physiological phenomena such as protein folding took a few microseconds to observe, which means that the world's strongest supercomputer needs at least dozens of days to simulate a simple protein folding, and a single protein fold is not even a drop in the bucket compared to the volume of the problem that scientists want to solve. Many scientists believe that such efficient research tools do not make much sense at all for solving serious problems. However, the second-generation Itanium supercomputer can simulate nearly 10 microseconds of results in a day, which is 2-4 orders of magnitude faster than traditional supercomputing, which makes the computer simulation of proteins instantly a more efficient and fine-grained research method than experimental methods, and many research ideas that could not be completed by experimental methods alone in the past have become possible overnight. More importantly, for the research and development of innovative drugs, the addition of biocomputing methods has injected a new impetus. Based on the support of the Itanium supercomputer, Relay, an American pharmaceutical company, significantly shortened the new drug development cycle, and successfully determined the structure of RLY-4008 in only 18 months and at a cost of less than 100 million US dollars, subverting the traditional "double ten law" of drug R&D investment (that is, it takes at least 10 years and 1 billion US dollars to complete the research and development of a new drug).

Itanium Supercomputer's software and hardware system architecture is a textbook-level demonstration of 3D scientific computing architecture. Itanium uses a large number of ASIC-specific chips as its core components, and these ASICs are tightly interconnected through a well-designed, high-speed 3D ring network. In order to improve the efficiency of communication transmission, the entire Itanium server is tightly arranged in a cube chassis, which shortens the network interconnection distance between server nodes, thereby reducing communication delay and improving transmission reliability. In terms of connection, Itanium uses a Torus topology. This means that in the cube chassis that make up the cluster, each compute node is not only directly connected to the nearest 8 nodes in the vicinity, but also further connected to other nodes at multiples of 2 distances (e.g., 2, 4, 8, etc.). This feature ensures that the communication between any two nodes can be achieved by jumping no more than (log) steps, regardless of the edge length of the chassis, which significantly reduces the communication complexity between nodes in massively parallel computing. This logarithmic-level optimization design greatly improves the efficiency of data transmission within the cluster, ensuring that data can quickly and effectively circulate between nodes in a complex computing environment with a large number of nodes, fully meeting the needs of communication-intensive tasks.

图片来源：D. E. Shaw Research

The rise of the concept of 3D scientific computing marks the arrival of opportunities for scientific computing to accelerate the transformation of various segments. With its unique computing architecture, 3D scientific computing provides an understanding of the challenges of high-dimensional data processing, complex system simulation and large-scale parallel computing through the optimal layout of physical space and the reconstruction of data flow. As a model of the combination of 3D scientific computing and industrial practice, the Itanium supercomputer has brought a revolution to the field of drug discovery, and has made the world see the important role of the iteration of efficient research tools in promoting the development and progress of subdivided fields. It is believed that in the future, 3D scientific computing will further broaden its application scope in various high-tech industries, and provide more powerful theoretical support and practical tools for solving major scientific and technological challenges faced by the world.