laitimes

In the CVPR Autonomous Driving Challenge, the global championship was taken away by the computing power player

author:Quantum Position

Yunzhong is from the Au Fei Temple

Quantum Position | 公众号 QbitAI

Inspur Information AI team has won another championship in the field of autonomous driving!

Not long ago, CVPR, the top academic conference in the field of computer vision, came to a successful conclusion in the eyes of the world, and officially announced the best paper and other awards. In addition to the birth of 10 excellent papers, another high-profile international challenge for autonomous driving also ended the "peak battle" at the same time.

In the CVPR 2024 Autonomous Driving International Challenge "Occupancy & Flow" track, the Inspur Information AI team stood out from more than 90 top AI teams in the world with an excellent score of 48.9% and won the crown.

This is also another demonstration of the team's strength for Occupancy technology after topping the nuScenes 3D object detection list in 2022 and 2023.

In the CVPR Autonomous Driving Challenge, the global championship was taken away by the computing power player

△Figure 1 - Inspur Information AI team won the first place in the grid and motion estimation track

The CVPR 2024 Autonomous Driving International Challenge is an important part of the International Conference on Computer Vision and Pattern Recognition (IEEE/CVF Conference on Computer Vision and Pattern Recognition), focusing on technological innovation and application research in the field of autonomous driving. This year's CVPR Autonomous Driving International Challenge is also very interesting, with seven tracks in three directions: perception, prediction and planning.

The Occupancy & Flow track climbed by the Inspur Information AI team is also the most watched track of this year's CVPR Autonomous Driving International Challenge, focusing on perception tasks and attracting more than 90 top AI teams from 17 countries and regions around the world to participate in the challenge.

The competition provides large-scale occupancy raster data and evaluation standards based on the nuScenes dataset, which requires teams to use camera image information to predict the occupancy and flow of rasterized 3D space, so as to evaluate the ability of the perception system to represent highly dynamic and irregular driving scenarios.

Occupancy: Challenges for more granular environment perception and prediction

The complexity of road layout, the diversity of transportation vehicles, and the density of pedestrian flow are the current status quo of urban road traffic and the practical challenges faced by the field of autonomous driving. In order to meet this challenge, effective obstacle recognition and obstacle avoidance strategies, as well as perception and understanding of the 3D environment, become crucial.

Traditional 3D object detection methods usually use bounding boxes to represent the position and size of objects, but for objects with complex geometries, this method often fails to accurately describe their shape characteristics, and also ignores the perception of background elements. Therefore, the traditional perception methods based on 3D bounding boxes can no longer meet the needs of accurate perception and prediction in complex road environments.

As a new autonomous driving perception algorithm, Occupancy Networks enables the system to determine the position and shape of objects in three-dimensional space by obtaining three-dimensional grid occupancy information, and then effectively identify and deal with obstacles that are not clearly marked or have complex shapes, such as special-shaped cars, stones on the road, scattered cardboard boxes, etc.

This occupancy grid network allows autonomous driving systems to more accurately understand their surroundings and not only recognize objects, but also distinguish between static and dynamic objects. It is important to represent the three-dimensional environment with high resolution and accuracy to improve the safety, accuracy and reliability of autonomous driving systems in complex scenarios.

As shown in the figure below, the 3D object detection algorithm can only give the overall outline frame of the excavator (left), but the occupying grid network can more accurately describe the specific geometry of the excavator (right).

In the CVPR Autonomous Driving Challenge, the global championship was taken away by the computing power player

The Inspur Information AI team achieved the highest results in the track

In the Occupancy & Flow track, the Inspur AI team achieved the highest score in the track with an excellent performance of 48.9%.

Specifically, the "F-OCC" algorithm model submitted by the team achieved the strongest model performance in the track with advanced model structure design, data processing capabilities and operator optimization capabilities, and achieved the highest scores in two evaluation indicators: RayIoU (evaluation of grid occupancy based on projected rays) and mAVE (average velocity error).

A more concise and efficient model architecture achieves a breakthrough in computing efficiency and detection performance

Firstly, the overall model chooses a perception architecture based on forward projection, and adopts the FlashInternImage model with high efficiency and good performance.

At the same time, through the optimization of the overall process such as superparameter tuning and operator acceleration, the highest scores were obtained in both the grid occupation and motion estimation, and the computing efficiency of the model was improved, and the model iteration and inference speed were accelerated.

In real-world applications, this improvement enables the model to process large-scale 3D voxel data more quickly and efficiently, enabling autonomous vehicles to better understand the environment, thereby improving the accuracy and real-time nature of decision-making.

In the CVPR Autonomous Driving Challenge, the global championship was taken away by the computing power player

△Figure 3 - Architecture diagram of the F-OCC algorithm model

More powerful and complete data processing, and comprehensively improve model detection capabilities

In terms of data processing, the Voxel labels provided by the competition contain a large number of points that cannot be observed in the image, such as voxels that are obscured by objects and voxels that are not visible inside objects, which can interfere with the training of prediction networks based on image data during training.

In the training data, the Inspur AI team generated a visual mask by simulating the LiDAR beam, which improved the prediction accuracy of the model. On the other hand, by introducing voxel points at the edge of the perception range to participate in training, the false detection problem in the perceptual edge region was effectively solved, and the overall detection performance of the model was improved by 11%.

More refined 3D voxel coding, model occupancy prediction ability increased by more than 5%

In the 3D voxel feature coding module, the algorithm team applies the deformable convolution operation with a large perception range and coding ability to the 3D voxel data to improve the representation ability of 3D features.

The deformable 3D convolution (DCN3D) is implemented and optimized by using CUDA, which greatly improves the computing speed of the model and effectively reduces the video memory consumption.

By replacing traditional 3D convolution with DCN3D, the overall prediction ability of the model is increased by more than 5%.

In addition, based on the open-source large model, the Inspur AI team has also improved the perception and understanding ability of the multimodal model on the autonomous driving BEV image by optimizing the image encoder model and feature fusion alignment, and optimizing it from the aspects of CoT (Chain of Thought), GoT (Graph of Thought), and Prompt engineering. In the end, with a score of 74.2%, it won the fifth place in the "Application of Large Language Models in Autonomous Driving" (LLM4AD) track of this year's CVPR Autonomous Driving International Challenge.

In 2022, the Inspur Information AI team won the first place in the nuScenes competition for the nuScenes pure vision 3D object detection task, and increased the key indicator NDS to 62.4% in one fell swoop.

In 2023, the team won the championship again, setting a new record for the highest score in the 3D object detection track with a high score of 77.6%.

From BEV pure vision to BEV multimodality, and now with the "F-OCC" algorithm model, it has once again topped the CVPR 2024 Autonomous Driving International Challenge, occupying the top spot in grid and motion estimation tasks (Occupancy & Flow). The AI team of Inspur Information has gradually explored and killed all the way, providing strong support and experience for exploring higher-level autonomous driving technology.

We look forward to seeing what this team will do in the future!

*This article is published with permission from qubits, and the views are solely those of the author.

— END —

Quantum QbitAI · 头条号

Follow us and be the first to know about the signing of cutting-edge technology trends

Read on