The growth of data-intensive computing tasks requires processing units with higher performance and energy efficiency, but these requirements are becoming increasingly difficult to achieve using traditional semiconductor technologies. One potential solution is to combine the development of devices with innovations in system architecture.
Here, Academician Peng Lianmao and Professor Zhang Zhiyong of Peking University report a tensor processing unit (TPU) based on 3000 carbon nanotube field-effect transistors, which can perform energy-saving convolution operations and matrix multiplication. TPU is built with a pulsating array architecture that allows parallel 2-bit integer multiplication and accumulation operations. The TPU-based five-layer convolutional neural network can perform MNIST image recognition with an accuracy of up to 88% at a power consumption of 295μW. They use an optimized nanotube manufacturing process that provides 99.9999% semiconductor purity and an ultra-clean surface, resulting in transistors with high on-current density and uniformity. Using system-level simulations, the authors estimate that 8-bit TPUs made from nanotube transistors at the 180 nm technology node can achieve a dominant frequency of 850 MHz and an efficiency of 1 trillion operations per watt per second. The research results were published in the latest issue of Nature Electronics under the title "A carbon-nanotube-based tensor processing unit".
Peng Lianmao, director of the Department of Electronics of the School of Information Science and Technology of Peking University and academician of the Chinese Academy of Sciences, once said in an exclusive interview with The Paper at the Beijing Institute of Carbon-based Integrated Circuits: "We have been on the road of carbon-based integrated circuits for 20 years, and we have not seen any obstacles that make us feel that we cannot go on. ”
Continental has been working on carbon-based electronics since 2000. In 2007, the team of Academician Peng Lianmao and Professor Zhang Zhiyong of Peking University proposed a method for the preparation of carbon nanotube CMOS devices without doping, and prepared the first carbon nanotube transistor device with performance exceeding that of silicon-based transistors of the same size. In 2017, the team published an article in Science to prepare a top-gate carbon nanotube field-effect transistor at the 5 nm technology node for the first time, which has about 10 times the advantage of intrinsic performance and power consumption over traditional silicon-based transistor devices of the same size, demonstrating the great potential of carbon nanotube electronics.
In May 2020, the team published an article in Science again, using the method of multiple purification and limited self-assembly, to prepare a high-density parallel array of carbon nanotubes with a purity of more than 99.9999% on a four-inch substrate, which met the needs of ultra-large-scale carbon nanotube integrated circuits and laid the foundation for promoting the practical application and industrialization of carbon-based integrated circuits.
【CNT TPU的硬件实现】
The CNT TPU consists of a 3x3 processing element (PE) array, a control module, and an input/output multiplexer. Each PE is designed to perform 2-digit integer multiplicative accumulation (MAC) operations. The entire TPU consists of approximately 3,000 CNT FETs. The manufacturing process includes several innovative steps to ensure the high performance of CNT transistors, such as: (1) high-purity carbon nanotube films: achieved by a multi-dispersion sorting method. (2) Ultra-clean surface: ensured by a combination of annealing and wet cleaning processes.
Figure 1 depicts the overall system architecture of the CNT TPU, showing the arrangement of the PE, control module, and multiplexer. It includes SEM images of the manufactured CNTFETs and their structural diagrams, emphasizing the high uniformity and purity of the CNT network.
Figure 1: CNTFET-based digital computing system for tensor processing acceleration.
Figure 2 shows the electrical characteristics of a CNTFET, including transmission and output characteristics, as well as the performance of basic logic gates such as inverters and NAND gates. It emphasizes the robustness and high performance of CNT-based logic gates.
Figure 2: Electrical characteristics of top-gate p-FETs and basic logic gates.
【Pulsating Array Architecture and Convolutional Mapping】
The pulsating array architecture is a key element of the CNT TPU. It involves organizing simple PEs in a regular array, reducing design complexity and enhancing fault tolerance. Each PE performs a MAC operation and passes the results to neighboring PEs in a mesh topology, enabling efficient data flow and reduced energy consumption. This architecture is very effective for convolution operations in neural networks, where data and weights are propagated through arrays, partial summation is performed and the final output is produced sequentially. Figure 3 shows the internal structure of the PE in a pulsating array, including components such as multipliers, adders, and registers. It also demonstrates the flow of data during convolution operations with detailed SEM images and test signals.
Figure 3. Convolutional PE and data streams in CNT TPUs.
【Image Edge Extraction and Handwritten Digit Recognition】
To demonstrate the capabilities of CNTTPU, the researchers implemented image edge extraction and handwritten digit recognition tasks. TPU performs these tasks with impressive precision and low power consumption: (1) Image edge extraction: Image outlines are captured using a 3x3 core. The application of multiple cores to improve detail capture demonstrates the ability of TPU to perform complex image processing tasks. (2) Handwriting recognition: A five-layer convolutional neural network (CNN) was constructed, which achieved an accuracy of 88% when recognizing handwritten numbers in the MNIST dataset, while the power consumption was only 295μW. Figure 4 illustrates the results of image edge extraction using different kernels. It compares the contours captured by individual and combined cores, demonstrating the ability of TPU to extract detailed edge information.
Figure 4.Image edge extraction using single and combined kernels.
Figure 5 details the architecture and performance of a five-layer CNN implemented using a CNT TPU. It includes a comparison of power consumption and accuracy with other hardware systems, emphasizing the efficiency and accuracy of TPU in handwriting recognition tasks.
Figure 5.Five-layer CNNs with CNTTPU, performance metrics, and comparison of different systems
【Matrix-Matrix Multiplication of Pulsating Arrays】
Matrix multiplication is another important application of CNT TPU. The pulsating array architecture provides inherent advantages over traditional parallel matrix multipliers by making the most of the input data and minimizing data movement. This speeds up computations and reduces energy consumption, making CNT TPU very efficient for large-scale matrix operations. Figure 6 compares the data flow and energy consumption of CNT TPU and traditional parallel matrix multipliers during matrix multiplication tasks. It highlights the superior energy efficiency and speed of CNT TPU for large-scale matrix operations.
Figure 6: Matrix control via CNTTPU
【Summary】
CNT TPU represents a major advancement in the field of tensor processing units, combining high energy efficiency, scalability, and robust performance. By leveraging the unique properties of CNTs and pulsating array architectures, the TPU is ideal for a wide range of data-intensive applications, from image processing to neural network computing. The innovative manufacturing process ensures the high purity and uniformity of the CNT, paving the way for future developments beyond silicon computing technology.
--Testing Services--
Source: Frontiers of Polymer Science