laitimes

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

Jin Lei from Au Fei Temple

Quantum Position | 公众号 QbitAI

Intel uses "light" to break through the thorny computing problems in the era of large models——

Launched the industry's first fully integrated OCI (Optical Computing Interconnect) chip.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

△ Source: Intel

You must know that at a time when large AI models follow the development of Scaling Law, in order to achieve better results, either model scale or data scale are developing towards a larger trend.

This will lead to higher requirements for the entire computing and storage, including intermediate I/O communication, at the computing power level.

Intel's breakthrough this time is I/O communication:

In CPUs and GPUs, optical I/O replaces electrical I/O for data transmission.

What's the use?

In a word, the data transmission distance is much farther, the amount is larger, and the power consumption is lower - it is more suitable for the "physique" of large AI models.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

△ Source: Intel

So why does Intel use "light"? How?

"Light" was used, and it changed from a horse-drawn carriage to a truck

The traditional use of electrical I/O (copper connection) has its advantages, such as supporting high bandwidth density and low power consumption, but the fatal problem is that the transmission distance is relatively short (less than 1 meter).

This is no problem to put in a rack, but large AI models are often equipped with server clusters as standard in terms of computing power.

Not only does it occupy a large area, but it also spans more than N racks, and the lines need to be tens of meters or even hundreds of meters long, and the power consumption is quite high; It eats up all the power supplied to the rack, so that there is not enough power to read and write to the compute and memory chips.

In addition, in terms of the storage and computing ratio, it is precisely because of the "large" characteristics of the large model that the ratio of reading hundreds of calculations at a time has now directly become close to 1:1.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

△ Source: Intel

This requires a new approach that can reduce power consumption and size while increasing computing power and storage density, so that more compute and storage can fit in a limited space.

With optical I/O, the problem is solved:

It can support 64 32Gbps channels in one direction on fiber up to 100 meters long.

A figurative analogy is that it is like going from horse-drawn carts (with limited capacity and distance) to cars and trucks (larger and farther away) to deliver goods.

Not only that, but even when it comes to higher density and more flexible data transfer over relatively close distances, OCI can be compared to a motorcycle, which is faster and more flexible.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

It is worth mentioning that this OCI approach is not a theoretical one.

According to Intel, they have leveraged the proven silicon photonics technology to integrate silicon photonic integrated circuits (PICs), optical amplifiers, and electronic integrated circuits that include lasers on chips.

In addition, OCI chips that are packaged with its own CPU have also been exhibited, and can also be integrated with SOC (system-on-a-chip) such as next-generation CPUs, GPUs, and IPUs.

And that's not all, Intel has shipped more than 8 million silicon photonic integrated circuits, of which more than 32 million lasers are now in use.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

△ Source: Intel

Then the next question is:

How is Intel's OCI "refined"?

Song Jiqiang, vice president of Intel Labs and president of Intel Labs China, made an in-depth analysis and interpretation of this issue.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

△ Song Jiqiang, Vice President of Intel Labs and President of Intel Labs China

Silicon photonics technology is a combination of two of the most important inventions of the 20th century: silicon integrated circuits and semiconductor lasers.

It enables faster data transfer speeds over longer distances than traditional electronics, while leveraging the efficiencies of Intel's high-capacity silicon product manufacturing.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

The silicon photonics integration technology released by Intel this time, OCI chips have reached the level of optoelectronic co-packaging.

This optoelectronic co-packaging is to put a silicon photonic integrated circuit (PIC) and an electronic integrated circuit (EIC) on a substrate to form an OCI chip, which is an integrated connected component.

This means that xPUs, including CPUs, and future GPUs can be packaged with OCI chips.

OCI chips convert all the electrical I/O signals from the data center CPU into light, which is transmitted to each other in the nodes or systems of the two data centers through optical fibers.

The current bidirectional data transfer speed reaches 4Tbps, and it is compatible with PCIe 5.0 in the upper layer of the transmission protocol, and supports 64 32Gbps lanes in one direction, which is sufficient in today's data centers:

It uses 8 pairs of fibers and consumes only 5 picojoules (pJ) per bit, or 10-12 joules, which is 3 times lower than the power consumption of pluggable optical transceiver modules (the latter being 15 picojoules per bit).
The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

△ Source: Intel

In an optical transmission channel, it actually has 8 different bands, each with a frequency interval of 200 GHz, which occupies a total of 1.6 THz spectrum spacing for transmission.

Light goes from visible to invisible, and in fact its spectrum width is very wide, and it is close to optical communication from THz.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

So where will OCI chips be used in the future?

In this regard, Song Jiqiang said:

One is that it can be used to achieve communication, and it can also be packaged with computing chips such as CPU and GPU, and computing plus communication is very tightly packaged.

Through silicon photonics integration and advanced packaging technology, advanced packaging Intel also has a lot of different technologies, which can achieve higher density I/O chips, and then combine them with other xPU chips, and in the future, based on the chips, many different types of computing and interconnection chips will be formed, which will have very good application prospects.

According to the performance evolution roadmap of OCI I/O interface chips, the current technical solution that can reach 32Tbps transmission speed mainly relies on iterative and steady improvement of three indicators, namely:

  • There are 8 stable bands in one fiber
  • The optical data transmission rate for each band is 32Gbps
  • It can pull 8 pairs of optical fibers at the same time without affecting each other

Taken together, these three indicators have a data transmission speed of 2 Tbps in one direction, and 4 Tbps in both directions. In the future, we can continue to evolve and gradually improve bandwidth capabilities.

The interconnection of CPUs and GPUs has soared from 1 meter to 100 meters, Intel: Do you believe in light?

△ Source: Intel

Finally, in terms of the differentiation of Intel's silicon photonics integration technology, Song Jiqiang also explained:

The main reason is that we make high-frequency laser emitters on wafers, and also integrate silicon optical amplifiers, which are two core technologies, which are manufactured at the wafer level.

Next, we can mass-produce such highly integrated lasers, because the advantage of such on-chip lasers is that they can be transmitted with ordinary optical fibers.

And in terms of stability, it takes almost 10 billion hours for an error to occur.

So what do you think of the "light" of Intel Pick? Welcome to leave a message in the comment area to discuss.

Reference Links:

[1]https://mp.weixin.qq.com/s/ozx_ficqlxjEPKa5AlBdfA

[2]https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Intel-Shows-OCI-Optical-I-O-Chiplet-Co-packaged-with-CPU-at/post/1582541

[3]https://www.youtube.com/watch?v=Fml3yuPR2AU

— END —

QubitAI · 头条号签约

Follow us and be the first to know about cutting-edge technology trends