Talk about the body and soul of autonomous driving

2022-04-19 09:05:15

Tesla's paid online activation of rear seat heating has entered the hot search many times, and many people call it another popular science of software-defined cars. The idea of software-defined cars has begun to take root!

So will the car of the future really be defined by software? Or is it really enough for the car of the future to be defined only by software?

Hardware and software --- skin without hair will be attached?

Software and hardware have always been a symbiotic relationship of lips and teeth, hardware is the carrier of software, software is the expression of hardware, software determines the level of control of hardware, and hardware determines the functional boundaries of software.

Taking Apple, the head internet celebrity of consumer electronics, as an example, this software ecosystem company with mobile operating system as its core competitiveness unexpectedly spends a lot of time introducing the black technology of its products in hardware at each new product launch. In fact, we also have this feeling in our lives, using an old computer to play the latest large-scale games is completely immobile, and using an old mobile phone to install a new system is stuck to make people doubt life. So the software shines in the front, and the hardware silently supports it behind the scenes.

The landing of automatic driving requires the computing power support of hardware (AI chips).

The realization of automatic driving, need to rely on sensors to collect information about the road environment, including ultrasonic, cameras, millimeter wave radar, lidar, etc., the collected data needs to be transmitted to the car central processor for processing, used to identify obstacles, feasible roads, etc., and finally according to the results of identification, plan the path, set the speed, automatically drive the car. The whole process needs to be completed instantaneously, and the delay must be controlled at the level of milliseconds or even microseconds to ensure the safety of automatic driving. To complete the effect of instantaneous processing, feedback, decision planning, and execution, the computing power of the central processor is very high. The most intuitive embodiment is the camera for perceiving the road environment, usually densely packed with bodywork, the number of about 12, in order to identify obstacles, the processor needs to analyze the data taken by the multi-channel camera in real time, and a single 1080P HIGH-definition camera can produce more than 1G of data per second, and the amount of data is not small. In order to accurately identify the effective information in images and videos, the industry mostly uses deep learning neural networks.

Talk about the body and soul of autonomous driving

Deep learning consists of two parts: training and inference

The fundamental idea of deep learning is to transform anything into vectors of high-dimensional space, while a powerful neural network is a combination of countless matrix operations and simple nonlinear transformations. The essence of a deep learning neural network is to abstract the analysis process into the multiplicative product result of the multiplication and the value of the accumulator, and then store it in the multiplicative cumulative calculation of the accumulator. The key theories of deep learning are linear algebra and probability theory, and the rest is brute force computation, so deep learning neural networks, especially hundreds of layers of neural networks, have very high requirements for high-performance computing! Because the higher the computing power, the more information can be processed in a certain period of time, and the higher the accuracy of the decision will be! Studies have shown that for each level of autonomous driving, the hashing power must increase by an order of magnitude, and the L2 level only needs 2 TOPS (TOPS: trillion floating-point instructions per second) computing power, but L5 requires more than 4,000 TOPS hashrate.

If a large part of the performance of traditional fuel vehicles is determined by the engine power, then a large part of the quality of future self-driving cars is determined by the digital engine of the AI chip!

Architectural classification of autonomous driving hardware (AI chips).

Due to the strong demand for high computing power and low power consumption of automatic driving, the traditional CPU control chip alone can no longer meet the application needs in this field. The biggest advantage of a CPU is flexibility. With the von Neumann architecture, we can load any software for millions of different applications. But because the CPU is so flexible, the hardware can't keep knowing what the next computation is until it reads the software's next instruction. The CPU must internally save the result of each calculation to memory (also known as register or L1 cache). Memory access became a short board in the CPU architecture, known as the von Neumann bottleneck. While every step in the large-scale operations of neural networks is completely predictable, every unit of arithmetic logic of the CPU (ALU, the component that controls multipliers and adders) can only perform them one after the other, requiring access to memory each time, limiting overall throughput, and requiring a lot of energy consumption. All in all, although the CPU can handle various computing tasks very efficiently, the limitation of the CPU is that it can only handle a relatively small number of tasks at a time, so its computing speed requirements cannot meet the needs of deep learning, which requires excellent parallel matrix computing power!

The CPU can no longer meet the requirements of future autonomous driving chips

At present, the main control chips applied to the field of automatic driving above L3 are mainly divided into three categories according to the technical architecture:

First, the image processing unit GPU (Graphics Processing Unit) represented by NVIDIA's DRIVE PX platform. GPUs are less efficient at performing a single task and can handle a smaller range of tasks. But the power of GPUs is that they can perform many tasks at the same time, so GPUs have a natural advantage in handling complex operations. For example, if you need to multiply 3 floating-point numbers, the CPU will be stronger than the GPU, but if you need to do 1 million multiplications of 3 floating-point numbers, then the GPU will crush the CPU. Practice has proved that GPUs can provide significant acceleration effects for the training and classification of neural networks.

But GPUs still have four limitations when applied to deep learning algorithms:

1, the application process can not give full play to the advantages of parallel computing. Deep learning includes training and inference two computing links, GPU is very efficient in deep learning algorithm training, but for the occasion of inference of a single input, the advantages of parallelism cannot be fully exerted;

2, can not flexibly configure the hardware structure. The GPU adopts SIMT computing mode, and the hardware structure is relatively fixed. At present, the deep learning algorithm is not completely stable, and if the deep learning algorithm changes greatly, the GPU cannot flexibly configure the hardware structure;

3, the GPU is still a general-purpose processor, which brings us back to the basic problem - von Neumann bottleneck. In a calculation of several thousand ALU at a time, the GPU needs to access registers or shared memory to read and save intermediate calculation results. Therefore, if the GPU wants to perform more parallel computations on its ALU, it will also consume more energy to access memory proportionally, and also increase the physical space occupation of the GPU due to complex wiring. Therefore, in order to improve the running speed, the GPU chooses to stack cores, resulting in size not having an advantage;

4, GPU power consumption is huge. NVIDIA's Drive PX and Xavier are powerful, but the overall power consumption reaches 250w, which will put some pressure on the car's power system. In general, gasoline vehicles are only when the engine is started to generate electricity to drive the power consumption of large equipment functions, such as air conditioning, if the control core of automatic driving must consume hundreds of watts of power consumption, although theoretically can turn off most of the calculation functions for automatic driving when idling to save power consumption, but for the traditional battery of fuel vehicles will still cause considerable pressure. Even if it is an electric vehicle, if the non-motor parts need to consume so much electricity, there will be a certain reduction in the mileage. And if automatic driving is turned on, these control cores in order to calculate the peripheral environment, grasp the changes, and react to the driving situation at any time, theoretically, they must work uninterruptedly and at full capacity, and there is no opportunity to enter the rest mode that can reduce power consumption.

Autonomous driving chips need to integrate huge amounts of data in real time to determine the driving environment and determine driving strategies

Second, the Horizon company's journey series is represented by a dedicated integrated electric ASIC (Application Specific Integrated Circuit). The computing power and computing efficiency of the ASIC chip are directly customized according to the needs of specific algorithms, so it can achieve the advantages of small size, low power consumption, high reliability, strong confidentiality, high computing performance and high computational efficiency. Therefore, in the specific application field for which it is targeted, the energy efficiency performance of ASIC chips is far beyond general-purpose chips such as CPUs and GPUs and semi-customized FPGAs.

The Horizon Journey 3 chip hashrate to 5TOPS and consumes only 2.5W

Taking the Mobileye EyeQ4 chip used on the NIO ES8 as an example, its maximum operation rate is 2.5TOPS and the power consumption is only 3W. The Audi A8, Volvo XC90, Tesla Model S and other self-driving models are equipped with Mobileye EyeQ3 chips, with a maximum computing rate of 0.256TOPS and a power consumption of 2.5W, which can also meet the computing power required for automatic driving L2 to L3.

In addition to the Horizon Series and Mobileye's EyeQ series, Google's TPU series and Cambrian's Cambricon1M series also belong to ASIC chips.

Of course, the disadvantages of ASIC chips are also obvious, because they are designed for specific algorithms, and once the chip is designed, the algorithm it adapts to is fixed, so once the algorithm changes, it may not be usable. However, as autonomous driving software and algorithms become more mature and stable, car companies will choose to independently develop ASICs that match their own technical solutions.

Comparison of performance and flexibility for different architectures

Third, xilinx's ZYNQ series as the representative of the field programmable gate array FPGA (Field-Pro grammable Gate Array). FPGAs are the product of further developments based on programmable devices such as PAL, GAL, CPLD, etc., and can be functionally defined by firing FPGA profiles into these gates and the wiring between the memory. FPGAs can perform both data parallel and task parallel computation, enabling higher concurrent processing than GPUs. It is superior in dense processing and high concurrency capabilities, and the power consumption is lower than that of CPUs and GPUs. Although FPGAs are highly regarded, they are not specifically developed for deep learning algorithms, and there are many limitations in practical applications:

1. The computing power of the basic unit is limited. In order to achieve reconfigurable features, the FPGA has a large number of very fine-grained basic units inside, but the computing power of each unit (mainly relying on LUT lookup tables) is much lower than that of the ALU modules in the CPU and GPU;

2. The proportion of computing resources is relatively low. To achieve reconfigurable features, a large number of resources within the FPGA are used for configurable on-chip routing and wiring;

3. There is still a big gap between speed and power consumption compared to dedicated custom chips (ASICs);

4. The price of FPGA is more expensive than THATIC, and the cost of a single FPGA in the case of scale amplification is much higher than that of a dedicated custom chip ASIC.

Hardware and software --- body and soul

If software is likened to the soul of the future car, then the hardware equipped with software is the body on which the soul depends.

The soul and the body are inseparable, and software and hardware are also a kind of fusion and symbiosis.

Without high-performance hardware, software can not play its own advantages, software is not optimized enough, even powerful hardware has nowhere to show its own performance, smart software with powerful hardware can make automatic driving technology finally land for human service.

And the car of the future will be an organism with a strong body and a smart brain!

Smart cars are software and hardware organisms

Now back to the question at the beginning:

Will the car of the future be defined by software?

The answer is yes, because software is a very important part of the car of the future!

The answer is clearly no, because the car of the future is defined by software, and it is also defined by hardware!

Software-defined cars, hardware determines software!

Reproduced from automotive electronics and software, Zhihu, the views in the text are only for sharing and exchange, and do not represent the position of this public account, such as copyright and other issues, please inform, we will deal with it in a timely manner.

-- END --

Talk about the body and soul of autonomous driving

Read on