laitimes

Is the ChatGPT moment for self-driving?

author:ABRMOOK
Is the ChatGPT moment for self-driving?

Words / Wu Gansha (Co-founder, Chairman and CEO) of UISEE

Edited by Tu Yanping

Design / Zhao Haoran

Editor's notes

Editor's note

On June 15, the second day of the 16th China Automotive Blue Book Forum, Wu Gansha, Co-founder, Chairman and CEO of UISEE, delivered a keynote speech on "How to Respond to Autonomous Driving Startups in the Face of the Contingent Large Model ChatGPT Moment" in the afternoon of the same day.

"Maybe the big model is the end game of true autonomous driving," he said. Musk said Tesla's version 12.4 is 5 to 10 times more performant. Does this mean that it has a huge boost in the scale of the model? Will multimodal models with billions to tens of billions of parameters emerge?

Wu Gansha said, "If Tesla fails, that is, after tens of billions of dollars of investment, it still does not converge, and its FSD growth curve begins to flatten to a certain extent, it may face shocking pressure on the stock market." But if it succeeds, maybe the big and small companies on this track will be left behind. ”

In his speech, he talked about UISEE's response strategy as an autonomous driving startup.

The following is a transcript of Wu Gansha's speech, with abridgement.

Is the ChatGPT moment for self-driving?

Thank you very much for the invitation from Auto Business Review, and it's a pleasure to be here again in the Blue Book forum. Hello to all colleagues, friends from the media. Because of the time relationship, I will only talk about two questions: first, everyone said that today we are facing the ChatGPT moment of large-scale autonomous driving, will it happen; Second, how should we respond as a start-up company for autonomous driving?

Is the ChatGPT moment for self-driving?

Is the ChatGPT moment coming?

Are we really facing such a ChatGPT moment?

Is the ChatGPT moment for self-driving?

This is a statistical curve of the data spontaneously uploaded by Tesla owners. This is a city FSD figure with a rapid increase between 11.4 and 12.3. Of course, this data is changing at any time, but basically more than 200 kilometers will have a "dangerous takeover".

Let's take a look at the domestic Xiaopeng, which is relatively leading in China. He Xiaopeng said more sincerely, on the highway can reach 1000 kilometers 1 time to take over, in the city is less than 10 kilometers 1 time to take over.

At first glance, it feels like Tesla is indeed opening up the gap quickly, but let's take a closer look at its 12.3.6, in fact, its general takeover is 31 kilometers 1 takeover, and the high-speed is 134 kilometers 1 takeover.

On the one hand, we can see that it is improving rapidly, but if we distinguish between dangerous takeovers and ordinary takeovers, we will find that the data of ordinary takeovers is not far ahead. What's more, China's road conditions are much more complicated than those of the United States.

You can look at the data of 2015, how many lives are caused per 100,000 vehicles per year, China is actually far more than the United States and Germany, which means that China's traffic conditions are much more complicated. If you compare a takeover of 31 kilometers and a takeover of less than 10 kilometers, you don't say that Tesla is far ahead of Xpeng.

So, so far, we don't think there may be a way to draw a very accurate conclusion, unless we see the news today that Tesla's 10 FSD cars are going to run in Shanghai, so as to avoid such a comparison between Guan Gong and Qin Qiong.

So why do we still ask the question, is it facing a breakthrough moment? Because we've recently seen some of Musk's investor-oriented statements:

First, in the past two years, their computing power has increased by more than 10 times, an order of magnitude, from the previous 5,760 A100 Dojos, to the end of this year may increase to 85,000 H100s. That's a billion-dollar investment.

Second, the training data has increased by more than 10 times. Because Dojo started with 1 million 10-second videos, but the last interview has reached tens of millions of videos.

Third, the computing power of the vehicle has increased by almost 5 times, from HW3.0 of 144TOPS (this HW3.0 can only run about 100 million parameters) to the current HW4.0 of 720TOPS, and special optimization has been made for Transformer.

So, we can't help but wonder if it's a huge improvement in the scale of the model? From today's 100 million parameters to billions of parameters, will it have the ability to emerge (inferences, touch bypasses, etc.)? That's what we're particularly looking forward to seeing right now.

Musk teased it in May, saying that their version 12.4 could be improved by 5 to 10 times. Therefore, combined with these data here, the training computing power is increased by 10 times, the data is increased by 10 times, the model is improved by 10 times, and the performance becomes 10 times. So, it's very interesting that this is actually happening.

而且,我们对比一下大模型的训练,比如前面是10万亿个token,几万张卡训练100天,做预训练,再做有专家监督下的精调(Supervised Fine Tuning),最后是人类反馈的强化学习(RLHF,Reinforcement Learning from Human Feedback)。

This example is very similar to how we learn to drive. We also have a pre-training process before learning to drive, and I didn't learn to drive for the first 18 years, but I just learned common sense, which formed my world view and my cognitive model. It's 18 years of social experience, like a pre-training process. Then when I was 18 years old, I went to a driving school and found an instructor to teach me how to drive, which was like Fine Tuning under the supervision of an expert. Then I got my driver's license and bought my own car, and I started driving and practicing on the road from a novice, bumping and bumping, practice makes perfect, and slowly I drove better and better. It's like a process of continuous feedback on the next reinforcement learning.

So, maybe the big model is the end game of true autonomous driving. So many corner cases we are talking about today may not be exhausted by manpower, but by such a method.

Is the ChatGPT moment for self-driving?

In 2017, when I attended the CVPR meeting with Xudong (Momenta CEO Xudong Cao), we were also talking about end-to-end. At that time, I had an idea that the large model is like our system 2, which requires high computing power and high power consumption to think about some of the most difficult and rare traffic conditions, and finally solve them. But end-to-end like System 1, it can be analogous to our human driving instincts. Most of the time we drive today with other things in our heads, listening to music, driving in a very low power, very low computing power, this is an end-to-end model. It cannot be ruled out that this may be a model that may be the end game of our future autonomous driving implementation.

Of course, if Tesla fails, even if the investment of tens of billions of dollars still does not converge, and its FSD growth curve begins to flatten to a certain extent, it may face shocking pressure on the stock market, because after all, selling 2 million cars a year may not be worth such a high valuation. But if it succeeds, maybe the big and small companies on this track will be left behind. This may be something we'll have to wait and see.

Is the ChatGPT moment for self-driving?

Differentiated competition

We are a company with a focus on L4 commercial vehicles, but since our inception in 2016, we have been working on passenger cars. Of course, the size of this team is very small, just now Xudong said 1,300 people, we are less than one-tenth. With such a small team, how should we make a passenger car, and I will also share it with you today.

There is no doubt that we have no way to do such an investment intensity as FSD, so we do differentiated competition, benchmark EAP, and make the ultimate intelligence-price ratio. For example, can we achieve EAP on a car of 100,000 yuan?

Is the ChatGPT moment for self-driving?

What is EAP? You can see that Tesla's intelligent driving is three levels, the top is the basic AP, the middle one is EAP, and the bottom is FSD. This EAP is what we often call high-speed NOA, which is integrated with parking and driving, and its quotation is 32,000 yuan, while FSD is 64,000 yuan.

Today's FSD or City NOA is in the process of going from 90 to 99, which requires a huge investment. But on the other hand, the 32,000 yuan of EAP, the high-speed NOA, the integration of driving and parking, and the commuting memory driving, may be in the process of 99 to 99.99 points. So can this system be made to 3,000 yuan instead of 32,000 yuan? This could be another place worth exploring.

On the one hand, it is to make the experience from 99 points to 99.99 points, and on the other hand, to greatly reduce the cost. We also have some exploration in this.

Is the ChatGPT moment for self-driving?

This is a product line for our passenger cars, and the bottom end is the all-in-one machine. The all-in-one product line is basically based on the horizon, and the bottom end is J2, with 2 million pixels and 8 million pixels. What I just said is the middle product, which is a combination of line and parking. This is actually the form of a pre-controller that L4 wants to conceive together with the urban NOA.

There is a product in the middle, the cost is extremely low, it is J2 plus E3, which can achieve high-speed NOA plus APA based on ultrasonic radar, which is a basic parking integration, it is the ultimate cost.

Further up this is J3 plus E3, which we call the "ultimate IQ ratio", which adds a fusion APA on top of this, plus a memory car. Then there is another variant on this basis, with TDA4 in the middle, which is high-speed navigation plus memory driving, plus memory parking. And then to the top, add the city NOA. This is such a product line.

However, we adopt a design approach with extremely high module reuse, which allows us to have a very flexible body when working with OEMs and Tier1s. It can provide algorithms or software modules, can provide overall software packages and services, can also provide hardware reference designs to our partners, or provide software and hardware integration solutions, so it can be very flexible. Our basic version of the integration of driving and parking and the ultimate intelligent price ratio of the integration of driving and parking, both of these products are a few thousand dollars smaller, but can provide an experience of benchmarking EAP.

In the process, I will also introduce our methodology. In fact, our earliest requirements for this team are modular, the software is highly modular and reusable, and the hardware can support all kinds of computing platforms, from J3 to TDA to NXP to Infineon, including our domestic SemiDrive and so on. To sum up, the hardware can adapt to various brands, and the software is highly modular.

But in the first seven years, we basically took two routes, that is, driving and parking were done separately. Then I made such a software architecture that integrates the operation and parking, which is the product of the entire re-starting architecture. This product is also based on SOA, which further improves the development efficiency and scalability of functions.

At the same time, we have done a lot of work. Let me introduce a little bit here.

Because of such an extreme intelligence-to-price ratio platform, a J3 plus an E3, it is difficult to use data-driven methods and neural networks in addition to perception. But if today's method is based on human rules, there is actually a lot of data that is useless, because people don't have time to process it, so it will be inefficient. But if you use a data-driven approach, using a neural network, it has a relatively low security level, it can only reach QM, and there is no way to achieve a higher security level.

Joseph Sifakis, a fellow Turing Award winner, actually asked the question, why is it so hard to drive autonomous? The discussion and discussion finally went in one direction, that is, the combination of model-based, rule-based, and data-driven neural network methods, whether such an approach can run on extremely low-end chips.

Is the ChatGPT moment for self-driving?

Let's take target selection as an example, you can see that we can run such a system on an MCU, on the one hand, it is a data-driven LSTM (Long Short Term Memory) network, and on the other hand, it is based on rules, plus a synthersizer, such a system. A neural network can run on one MCU core, and then the rules and synthesizer run on another core. Of course, the neural network is QM, and the other is rule-based, which is ASIL D.

Together, we are able to achieve a comprehensive ASIL D functional safety level. At the same time, its occupation of code space and data space is actually at the level of several hundred kb, which can reach the certification of 26262.

Can we meet the requirements of SOD through a converged system that is data-driven and meets higher performance on the one hand, and extreme cost on the other hand?

In other cases, we are able to continuously generate higher quality data through generative adversarial networks through generative adversarial networks, such as data selection, regulation, which we don't have a lot of data today.

Here's an example, if a very small neural network's algorithm is fused with a rule-based approach, then what it has to deal with is a car, which is cut-in. You can see that the small neural network-based can detect the cut-in intent more than 2 seconds earlier than the rule-based one. Overall, false negatives can be greatly reduced, and the recall can be increased by 50%.

We also use this system for many other functions, for example, this is a pure visual AEB, we also got the five-star + standard, which can achieve a stop at 85 kilometers per hour.

Is the ChatGPT moment for self-driving?

Follow the first echelon

We still have to keep up with the first echelon, keep up with the forefront in terms of algorithms, and still be able to guarantee modular delivery.

In the past few years, Tesla has made a lot of innovations in BEV Transformer, including Lanes Network without a picture, including from a single frame to a video stream, including to the Occupancy Network, etc., and the next step is to do the neural networking of different modules, and finally realize the overall end-to-end unified network.

We have also been following these algorithms, such as BEV+Transformer+Occupancy Network, a system we have made, which recently won first place in an international conference Robo Drive Challenge. We have a lot of these algorithms, and it's pretty good from this performance, and we can deliver them as modules.

Is the ChatGPT moment for self-driving?

Serve key customers

There's no way for a startup to put in that many GPUs and not that much data, but who does? Our big customers may have, especially some big OEMs, who have data and computing power. We can also provide them with software services such as data closed-loop, operation and maintenance platform, and large computing power training platform.

Because we do L4, you know that L4 actually needs a very good closed loop because it needs to iterate quickly. Therefore, we have a black box data storage system (DSSAD) on the vehicle side, and we also have a good training platform for autonomous driving in the cloud.

Especially since last year, we have also been applying some large model technologies, such as scene understanding, pre-labeling, data mining, etc. This is a typical platform that a smart driving company or OEM should have. Then we can do the delivery of this kind of cloud container, and we can also do the delivery of the actual all-in-one machine. Because some companies may not want to use the cloud, and its data volume may be enough for a 24-192 card all-in-one machine, then we can also deliver such an all-in-one machine to ensure rapid deployment and out-of-the-box use. That's the first point.

The second is our operations platform. I think the operation and maintenance platform is more distinctive. Why? The L4 system was an early attempt to try the subscription service model. It's that after I sold such a system, because an AI driver was added to the system, I was able to charge a little salary for the AI driver every year, which was a subscription service.

But if you don't do a good job with this subscription service, you can't guarantee customer satisfaction on the one hand. Suppose a car works 20 hours a day, and only 4 hours of the 24 hours are under maintenance, and the availability rate of 99.99% for 20 hours, that is, only about one hour a year is not in working condition, this requirement is very high.

On the other hand, a system like L4, such as a lidar, may cost tens of thousands of yuan, a domain controller may cost tens of thousands of yuan, and the subscription fee for that year may be tens of thousands of yuan. If you can't have a good operation platform, then you will eventually lose money by subscribing to the service. Therefore, we have made a good cloud service platform, and such operation and maintenance capabilities can also be output.

Is the ChatGPT moment for self-driving?

Finally, make a summary, how should such a small team of ours do intelligent driving of passenger cars.

First, we are very flexible and can provide hardware reference designs, overall software packages, or algorithms or software for individual modules, or, we have no data and no computing power, and we can provide data/cloud services for customers who have data and computing power.

Second, we also have very good algorithms and have been following Tesla's SOTA algorithm closely. We can deliver our algorithm modules modularly, or we can deliver a complete software and hardware integrated product, or a software package plus hardware reference design.

This product focuses on the high-speed NOA of EAP, plus the integration of driving and parking, plus the form of memory driving. In this form, we hope to achieve the ultimate intelligence-price ratio, and can sink to a platform of 100,000 yuan, and we can support Tier1 or OEM to create such products.

That's what I'm sharing, thank you!

Read on