laitimes

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

--Collect the "Automotive Driving Automation Classification" (GB/T 40429-2021)--

The ability of unmanned platforms to achieve autonomous positioning and navigation in large-scale environments is becoming increasingly demanding, and synchronous positioning and mapping technology (SLAM) based on lidar is the mainstream research solution. In this work, this paper systematically summarizes the framework and key modules of 3D LiDAR SLAM algorithm, analyzes and expounds the research hotspots and future development trends in recent years, sorts out the evaluation criteria for the performance of 3D LiDAR SLAM algorithm, and selects 6 kinds of open source 3D LiDAR SLAM algorithms that are more mature and representative at present to be tested and evaluated in the Robot Operating System (ROS), based on the KITTI benchmark data set, from the KITTI official accuracy standard, A horizontal comparison of SLAM algorithm accuracy index, algorithm time consumption and processing frame rate3 shows that the LIO-SAM algorithm has outstanding performance among the six selected algorithms, and the RMSE data of absolute trajectory error (ATE) and relative pose error (RPE) are 1 in the test of 00 sequence dataset. 303 and 0. 028, The frame rate (fps) processed by the algorithm is 28. 6, Finally, the application trend of 3D lidar SLAM technology is discussed according to citeSpace analysis.

With the continuous advancement and exploration of mobile robots and unmanned platforms in various application scenarios, the requirements for the positioning ability of mobile carriers are also getting higher and higher. Increasingly upgraded intelligent driving applications, especially more advanced unmanned driving, in order to ensure safety, the demand positioning accuracy to reach the centimeter level, the existing positioning input source is mainly based on the global positioning system (global positioning system, GPS) and other satellite positioning means, while the traditional GPS positioning and navigation accuracy is only 5 ~ 10 m, the current solution to improve the positioning accuracy is roughly divided into two categories.

One is to upgrade the satellite signal positioning input source, such as upgrading to a higher precision real time kinematic (RTK), or upgrading the low-cost micro electromechanical system (MEMS) inertial navigation equipment that assists in reckoning navigation and positioning to a higher precision level of inertial navigation, although such methods can improve positioning accuracy in most use cases, but in GPS In scenarios where signals are easily lost, such as parking lots, staggered elevated roads and indoor environments, there are still defects and high costs, while the estimation and positioning of inertial conduction itself has initialization and cumulative error problems.

The other is the introduction of new methods for realizing autonomous positioning sources based on the first type of defects proposed by researchers, such as the use of lidar, millimeter wave radar, cameras and other perception sensors, so the autonomous positioning scheme based on various types of sensors has begun to receive widespread attention from researchers, especially the autonomous positioning based on lidar and depth camera, which has become a hot spot scheme in recent years, this paper mainly elaborates on the autonomous positioning scheme based on multi-line lidar.

Simultaneous localization and mapping (SLAM) is widely used in mobile robots and unmanned driving as a technique for attitude estimation and positioning in unknown environments.

The concept of SLAM was first proposed by Cheeseman in 1986 at the IEEE Conference on Robotics and Automation, which collects information through sensors, generates a map of the environment explored by the unmanned platform and locates it, realizing the autonomous movement of the unmanned platform [1]. From the definition of this technology, it can be seen that the technology consists of two parts: map construction and positioning. Map construction is to project the sequential lidar point cloud or visual features collected by the sensor from the local coordinate system of each frame to the global coordinate system, and then complete the map stitching and positioning, that is, to obtain the position and attitude information of the mobile carrier in the built map.

On the one hand, obtaining the position and attitude of the local coordinate system corresponding to the data collected by the sensor is the key to building the map, that is, the map contains the positioning problem; On the other hand, building an accurate map is the premise of precise positioning, so the two aspects of positioning and mapping are highly coupled and can be used as a solution to a problem. Online real-time SLAM technology is being studied extensively as the core of the autonomous positioning field.

At present, the development of 2D laser SLAM technology has been relatively mature, which can be used for both civilian services such as sweeping robots and industrial sites, such as KUKA Navigation Solution [2]. The 3D SLAM technology based on laser point cloud uses three-dimensional laser sensors (generally multi-line lidar, and a small number of self-made single-line lidar combinations) to obtain three-dimensional spatial point data, and then conduct posture estimation through scanning matching between adjacent point cloud frames, and establish a complete point cloud map, which has the same technical principle as 2D laser SLAM.

3D lidar can directly collect a large number of points with accurate information on angles and distances through optical ranging, and the collection of these points is called a point cloud, which can reflect geometric information in the real environment. Because of its intuitive mapping, high ranging accuracy and not susceptible to changes in lighting and viewing angles, it is an indispensable sensor in map construction applications for outdoor large environments.

Thanks to the promotion of DARPA (U.S. Department of Defense Advanced Research Project Bureau Ground Challenge)[3-4], velodyne's multi-line lidar began to be used in unmanned vehicle SLAM [5], the positioning and mapping problems of mobile robots gradually changed from indoor to outdoor, and the map construction also expanded from two-dimensional to three-dimensional, the range was doubled, but the cost of multi-line radar was higher. With the mass production and popularization of multi-line lidar, as well as the reduction of power consumption and the enhancement of computing power of embedded processors, SLAM technology based on multi-line lidar is developing rapidly.

Because the three-dimensional SLAM method based on multi-line lidar has richer matching methods and better robustness in inter-frame matching, and can be integrated with image information, physical model and other information[6] to improve the positioning accuracy, it has great development potential. The relevant work on SLAM has been summarized before, especially the vision-based SLAM review has appeared more, but the laser-based SLAM summary work is relatively small, and it is mainly based on 2D lasers, and there is not much introduction involving the 3D liDAR SLAM algorithm. In this paper, we will systematically introduce and analyze the 3D laser SLAM technology, and compare and objectively evaluate the performance of the selected representative algorithms. This work paves the way for more in-depth research on 3D laser SLAM technology in the future, and we hope that this paper will provide some help to other researchers interested in 3D lidar SLAM technology.

1. 3D lidar SLAM scheme

The laser SLAM scheme that relies on lidar to build a map can be divided into two categories according to the solution method, based on filter-based and graph-based optimization, and the filter-based method is derived from Bayesian estimation theory, which is an early method to solve the SLAM problem, which has good effect in indoor or small-scale scene applications, but because only the current posture state and current environmental observation information of the moving carrier are considered, and there is no loop detection ability, there are problems such as linearization and low update efficiency[7-8]. In the process of program operation, it will also occupy exponentially increasing computing resources with the increase of the scene, which makes its performance effect in outdoor large-scale scenes relatively poor, and the laser SLAM scheme based on filters is mainly used in two-dimensional indoor small-range scenes.

Based on graph optimization, the SLAM scheme considers all the posture state and environmental observation information in the process of the mobile carrier, uses the graph formed by nodes and edges to represent a series of mobile robot postures and constraints, establishes and maintains the associated data, and can realize parallel computation independently of the front end, which is a more efficient and universal optimization method.

Compared with the early filter-based SLAM method, it is usually possible to obtain a map with better global consistency, and with the continuous development of the solution method, under the premise of the same amount of computation, the solution speed of the graph optimization SLAM has exceeded the filter method, which is the mainstream method in the current SLAM field, and it is also the main scheme taken by the three-dimensional laser SLAM, Hauke et al. [9] studied why the graph optimization method can achieve better results than the filter method.

Lu et al. [10] first proposed a 2D SLAM algorithm based on graph optimization in the 1990s, which used a constrained pose network to achieve data correlation, with a prototype of graph optimization. Gutmann et al. [11] formally proposed the graph optimization framework in 1999, which is roughly the same as the current mainstream graph optimization framework, with front-end scan matching, global optimization and closed-loop detection modules, but due to the technical development and cognitive limitations at that time, the sparseness of the system was not recognized, and real-time SLAM was not realized. Since then, many researchers at home and abroad have also continued to explore, which has made great contributions to the development of the diagram optimization SLAM method, and the various modules of the diagram optimization SLAM scheme framework are also gradually improved.

The SLAM algorithm based on two-dimensional laser is relatively mature, santos et al. [12] tested, evaluated and summarized five representative 2D laser SLAM algorithms based on single-line laser radar, and the results showed that Gmapping and KartoSLAM algorithms were more superior in the accuracy and efficiency of positioning and mapping. Then Google open sourced the Cartographer algorithm[13]

Using the current mainstream laser SLAM algorithm framework based on graph optimization, the branch delimitation method is proposed to solve the problem of sub-map construction and matching with global map, and realizes closed-loop detection and global optimization with better results, which is a representative of the more advanced and mature two-dimensional laser SLAM technology.

Two-dimensional lidar determines the distance and angle of the target in the environment by measuring the time difference and phase difference between the rotating scanned laser signal and its echo on the horizontal plane of determining the height, and establishes a two-dimensional polar coordinate system on the horizontal plane according to these two types of data to express the perceived environmental information, which can be regarded as a single-threaded three-dimensional lidar.

Compared with the two-dimensional lidar that can only perceive a single plane information in the environment and is suitable for indoor geometry to achieve small area map construction, three-dimensional lidar can further obtain richer information on the elevation, and has a better perception effect for outdoor large scenes.

Lidar according to the number of lines can be divided into single line, 4 / 8 lines and 16 / 32 / 64 / 128 lines 3 categories, with the increase of wiring harnesses, lidar can perceive the environment of the information more abundant, the amount of data obtained is correspondingly larger, the cost of the equipment is also multiplied, so the laser-based SLAM algorithm needs to be considered on the line beam, to achieve better real-time need to deal with the reduction of the amount of input data per frame, and less initial data because the wiring bundle is sparse and can not reflect the environmental information well.

The current 3D lidar SLAM algorithm research is mostly based on 16/32/64 line lidar, while unmanned driving applications pursue higher precision 128 lines, as shown in Figure 1. Based on multi-line lidar, 3D lidar SLAM technology inherits and develops the SLAM algorithm framework based on graph optimization, and applies it to unmanned driving and other fields to solve the problem of positioning and mapping of large scenes.

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 1 Multilinear lidar

1.1 Diagram optimization method

Graph optimization SLAM research basis is based on graph theory, graph (graph) is a data structure, by the vertex (vertex) and the edge (edge) connection vertex represented as G(V, E), where G represents the graph, the set of vertexes is represented as V, the set of edges is represented as E, its idea is to represent things with vertices, and the edges connecting different vertices are used to represent the relationship between things, if there is a vertex in figure G connecting more than two edges, then the graph is called a supergraph, in SLAM The process of constructing and optimizing the supermap based on the existing observational data is studied.

Assuming that the pose node of the moving carrier is represented by μ = { μ 1, μ2, ..., μ n } and the landmarks in the environment are represented as S =, the postures and landmarks of the mobile platform can be represented in Figure 2.

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 2 "Figure" in the SLAM problem

If at a certain time k, the moving carrier scans the observation through the laser sensor at a certain time to obtain data, the observation equation of the sensor is:

Since there is an error in the system parameters and sensor observations, making it impossible for the above formula to be precisely equal, so the error exists if you put:

Equation (2) as a target function, optimized as a variable, you can solve the estimation of the moving carrier posture, so as to calculate the estimated trajectory of the platform movement. Specific to the SLAM problem, the vertex is represented as the posture of lidar and the pose of the feature point, while the edge represents the observation equation, and the expression of the observation equation has a variety of forms, which can be defined as the constraint between the different postures of the moving platform, and can also be defined as the coordinate expression of a certain spatial point observed by the moving platform at a certain location.

Generally, the transformation matrix calculated by the odometry or registration between the poses is transformed into an optimization problem in the solution diagram. The model representation of the graph optimized SLAM can also be interpreted from the perspective of the spring energy model [14-15], as shown in Figure 3, the maximum likelihood estimation of the posture in SLAM, the minimum energy state of the corresponding system in the spring model, and the essential problem of both can be converted to nonlinear least squares problems.

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 3 Optimize the SLAM model

1.2 Figure Optimizing the SLAM Scheme Framework

The SLAM scheme based on graph optimization can be divided into four parts: scan matching, closed-loop detection, back-end optimization, and point cloud map storage. Scan matching uses sensor data such as lidar, inertial measurement unit (IMU) and encoder for scan matching, and uses the relationship between adjacent frames to estimate the motion attitude of the current frame, and obtains the relative posture and map in a short period of time, considering the problem of local data correlation.

Since long-term incremental scan matching inevitably causes error accumulation, closed-loop detection can optimize the posture by comparing the current frame with the historical keyframe, check the matching to establish the constraint relationship between nodes, reduce the drift error of the global map, and consider the global data association. If it is from the representation of graph-based optimization, both scan matching and closed-loop detection are aimed at establishing the nodes of the graph and the constraints between the nodes based on the observation information, that is, to complete the construction of the graph. Researchers have divided the two together into the front-end section of graph-optimized SLAM [14-15].

Due to the existence of system parameter errors, observation noise, and matching errors, the consistency of the posture diagram constructed by the front-end module is poor, and there is usually a "conflict" between the constraints of the edges and edges of the built graph. If a relative transformation matrix representing inter-frame matching is used, then T 0 , T 1 , T 2 , ..., T n constitute a closed loop, which ideally should satisfy T 0 T 1 T 2 ... T n = I, where I represents the identity matrix, but the relative transformation matrix obtained by the front end in real engineering generally does not achieve this result. Different from the front-end part, the graph optimization part is a non-linear optimization of the graph information constructed by the front end, to obtain the optimal solution that satisfies all the constraint relationships as much as possible, and finally output the attitude estimation result and the global point cloud map, which is also known as the SLAM back end, and together with the SLAM front end, the entire graph optimization SLAM framework is formed.

1.2.1 Scan Match

For front-end scan matching, representative 3D point cloud matching algorithms can be broadly divided into two categories: match-based methods and feature-based methods. The matching method can be divided into two types: distance judgment and probability model judgment based on the matching method, the method based on probability model judgment is mainly normalized distribution transform (NDT) algorithm [16], the method based on distance judgment is mainly ICP algorithm and its variant algorithm, and the algorithm suitable for 3D lidar SLAM includes PP-ICP [17], NICP [18] , IMLS-ICP [19], etc., the representative algorithm is the generalized iterative closest point (GICP)[20], the algorithm principle is to combine the ICP algorithm and the PL-ICP algorithm into the probability framework model for point cloud registration, which improves the applicability and accuracy of the algorithm.

Algorithms based on matching generally achieve accurate estimation by directly using scan points, which require the use of a large number of points for stable registration, although the matching accuracy is high, but usually the calculation efficiency is not high. At present, the latest algorithm that can quickly and accurately achieve 3D laser point cloud matching registration is a voxelized generalized iteration of the nearest point algorithm [21], which extends the GICP method through voxelization, avoids the costly nearest neighbor search, and maintains the accuracy of the algorithm, combining the advantages of ICP-like and NDT algorithms.

Feature-based methods improve computational efficiency by extracting feature points from scanned point clouds, including LOAM [22] using corner and planar point features, multi-resolution surfel maps using three-dimensional raster matching algorithms using surface element features, and many studies focusing on point cloud registration [23-27] focusing on radar point cloud feature descriptions, such as fast point features Histograms, FPFH)[23], VFH (viewpoint feature histogram)[24], etc., this method can improve the cost of computing and improve real-time, so it has been widely studied. A classic feature-based point cloud registration algorithm typically includes several steps in key point detection, feature descriptor extraction, true matching, outlier culling, and transformation estimation.

1.2.2 Closed-loop detection

Closed-loop detection is based on global data correlation, which is the core step to achieve robust SLAM, which can correct the accumulated error by identifying whether to reach the historical scene to promote the ability of the map to close the loop, thereby generating a globally consistent mapping map. But correspondingly, the result of the wrong closed loop will seriously affect the accuracy of the back-end optimization, and even directly lead to the poor performance of the final map. The difficulties of closed-loop detection are mainly reflected in:

1) Perceive ambiguity. For example, in scenes with very similar structures such as promenades, tunnels, and stairs, it will make it more difficult to judge.

2) Due to the sparseness of the laser sensor itself, the robustness and differentiation of the observational data are limited, that is, how to establish an easy-to-handle environmental effective representation.

3) The data scale will increase with the running time, which will lead to the continuous growth of the frame data that needs to be judged, which will reduce the real-time nature of the map.

Laser-based scene recognition is committed to seeking an effective and concise position descriptor, and the existing closed-loop detection technology has a node search algorithm based on MonteCarlo[28], and can also be assisted by gps-assisted closed-loop judgment; There is a loop detection algorithm based on the descriptor, which is used to identify the scene by extracting the local or global scene descriptor, and the local descriptor represents the algorithm FPFH [23]: the local surface normal vector is used to calculate the local descriptor, Bosse et al. [29] propose a probabilistic voting method based on the Gasalt3D descriptor, which is composed of geometric information and strength information isHOT [30] ;

Global description subrepresentation algorithms include the GLAROT method[31] which encodes the relative geometric positions of key pairs into a histogram,[31] the scan context method that projects laser scans onto global descriptors,[32] and so on, BoW(bag of words)[33] a scene recognition algorithm based on the bag-of-words model, and FAB-Map(fast appearance based mapping)[34] and DBoW2 [35] and other methods, but these methods were initially used for visual SLAM, such as ORB SLAM [36] and LDSO [37].

1.2.3 Backend graph optimization

Back-end optimization is a process that integrates the posture and inter-frame motion constraints of each frame radar to achieve overall optimization, which can eliminate local cumulative errors, and in large-scale mapping, it is generally necessary to have a "regulator" to coordinate the previous trajectory at all times, which is the back-end optimization of SLAM.

Review[38-39] The existing systematic introduction of optimization methods can be summarized into four categories: optimization methods based on least squares method, optimization method based on relaxation iteration, optimization method based on stochastic gradient descent, and manifold iteration method. Current open source optimization libraries based on graph optimization include iSAM (incremental smoothing and mapping)[40], GTSAM (georgia tech smoothing and mapping)[41], G2O (general graph optimization)[42], Ceres [43], BA (bundle adjustment)[44], etc. These optimization libraries save time iteratively solving the optimized values at the back end.

1.2.4 Map representation

The easiest thing to get from the 3D lidar SLAM algorithm is the point cloud map, as shown in Figure 4(a), which can visually describe the surrounding environment and has the basic environment display function, but due to the large number of point clouds, it generally needs to be downsampled through voxel filtering to display normally.

In addition, because the simple point cloud cannot represent obstacle information and does not have features, it cannot be directly used for navigation and obstacle avoidance and feature-based positioning, but can be used to achieve positioning research based on point cloud registration. On the basis of the point cloud map, according to the purpose of use, the features extracted from the point cloud map can be grouped together to form a sparse feature map using features for positioning, as shown in Figure 4(c), and can also constitute an occupied grid map for navigation obstacle avoidance and path planning and an octagon tree map with better compression performance, as shown in Figure 4(b).

Occupying the grid map often takes the way of raster or voxels, in the minimized raster or voxels with probability or 0 ~ 1 to represent the occupied state, CSM [45] constructed a two-dimensional occupied raster map, OctoMap [46] proposed to transform the point cloud map into a three-dimensional occupancy map, octopus map is a special occupation raster map, the structure of the same probability of the raster can be merged, thereby reducing the space for storing the map.

In recent years, the emergence and development of semantic maps has also improved the machine's understanding of the surrounding environment, as shown in Figure 4(c), semantic maps combine geometry and content perception, helping to improve the accuracy of SLAM mapping and positioning, and can also better improve the machine's ability to intelligently understand the environment.

2. Research hotspots and development trends

Lidar point clouds have the characteristics of local sparseness, large data volume and noise caused by dynamic objects, which has become a difficult point in SLAM research based on laser point clouds. Compared with the image matching problem, the sparseness of the point cloud makes it often not feasible to find two exact matching points from the source and target point clouds, and the appearance of the same object observed by the laser scanner from different angles varies greatly, which increases the difficulty of feature extraction.

Secondly, because the laser scanner will produce millions of points per second, it requires efficient algorithms and powerful computing units, which has high requirements for the real-time nature of the algorithm. Finally, dynamic object point clouds can cause noise interference, and whether the method of processing it properly is also crucial to achieve the ideal high-precision estimation. In addition, the constant change of scenes is considered to be the biggest challenge in solving the 3D LiDAR SLAM problem.

The use of laser point clouds to solve the problem of three-dimensional map construction in SLAM was first proposed in 2014 by Dr. Zhang et al. of Carnegie Mellon University, that is, the famous LOAM (lidar odometry and mapping)[22] algorithm, which extracts the laser point cloud line surface features in a novel way to reduce the amount of computation, and creatively divides the motion estimation problem into two independent algorithms to complete it together. One algorithm performs a high-frequency odometer but low-precision motion estimation, the other algorithm runs a low-frequency matching map but returns a high-precision motion estimate, and finally fuses the two data into a high-frequency high-precision motion estimate, which is a good trade-off between accuracy and efficiency, high real-time, and the only drawback is the lack of loopback detection.

After that, LeGO-LOAM [47] derived a new framework on the basis of LOAM, which improved lightweight and ground optimization in feature extraction, and added loop detection, and the whole framework coincided with the current mature map optimization 3D lidar SLAM scheme framework, and the built map was more perfect. hdl_gragh_slam [48] is also the standard graph-optimized 3D lidar SLAM framework, and the novelty is that gps, IMU and pavement constraint information are integrated in the back-end optimization process, which can better build a globally consistent map. With the advent of multi-line lidar, researchers at laser SLAM have made great progress in the field of three-dimensional laser SLAM, and the main research hotspots and trends have also focused on the key modules of the 3D lidar SLAM framework.

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm
10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 4 Map representation

2.1 Laser inertial odometer

The laser odometer (LiDAR odometry, LO) is only based on the point cloud registration algorithm to obtain the global posture of the current frame, but because Lidar itself will be affected by sparseness and motion disturbance, the perception accuracy will be reduced, and the number of perception points will be seriously reduced in some Lidar degradation cases, which will greatly affect the accuracy of the laser odometer. Researchers have fused IMU data to ensure and improve the accuracy of laser odometers. One category is the loose coupling method, which considers the estimation of Lidar and IMU, respectively, in exchange for a small computational load at the expense of lost information.

For example, in LOAM, it relies on the IMU solution orientation as an aid but assumes uniform motion, using the IMU as the a priori of the entire system, but it cannot utilize IMU measurements for further optimization.

Another category is the tight coupling method, which provides state prediction through IMU measurements, while measurement updates are used to correct the predicted state, also known in the field as lidar inertial odometry (LIO), LIO_mapping [49] first proposed the closely coupled Lidar-IMU fusion method, by jointly optimizing the measurement data of the IMU and Lidar, even in the case of Lidar degradation, there is no obvious drift, Compared with the loosely coupled method, the tightly coupled method will greatly improve the accuracy.

Liom proposed a new close-coupled laser inertial odometer and mapping method to achieve low drift and robust pose estimation, and the laser inertial odometer module uses an error-state Kalman filter (ESKF) to achieve sensor fusion of laser scanner and IMU [50].

MC2SLAM proposes a new tight coupling method, which unifies point cloud distortion compensation and point cloud matching into one optimization task, and uses IMU pre-integration in the back end for posture optimization, and the accuracy is further improved

[51]。 LIO-SAM, as the author of LeGO-SLAM, implements a tightly coupled laser-inertial odometer based on a graph optimization framework, optimized using a factor graph rather than filtering method.[52]

As the basic module of the entire 3D LiDAR SLAM scheme, the laser odometer module can directly perform pose estimation and map construction, and the output results of the module can be directly exported under ideal circumstances, and its accuracy directly affects the final result, so finding an efficient and universal method to achieve tight coupling of laser and inertial navigation data and improve accuracy is still the focus of future research.

2.2 Scene recognition

Scene recognition is one of SLAM's key tasks. Scene recognition has two functions, one is used for the recognition of historical scenes, and the other is used for posture estimation relative to the presence map, that is, repositioning. Dube et al. [53] first proposed the segmentation-based 3D point cloud scene recognition method SegMatch, which compromised the traditional method of scene recognition based on local features and global features, which can realize real-time and reliable detection closed-loop. The Segmap method [54] uses the same segmentation-based idea as Segmatch to propose a 3D point cloud segmentation descriptor based on deep learning, which improves positioning performance and enables semantic information extraction.

The SUMA method [55] uses Surfel maps to efficiently generate projection data associations and achieve closed-loop detection, which has been used in RGBD-SLAM for the first time by SUMA, which can establish a globally consistent map in a large-scale environment based only on laser point clouds.

DELIGHT relies only on lidar to achieve global positioning, using the reflection intensity information obtained by radar instead of conventional distance information for scene recognition, and the algorithm is divided into two stages, including a priori estimation based on density descriptors and a geometry-based verification, thus solving the "robot wake-up" problem [56].

The ISC method [57] is a global loop detection algorithm based on lidar point cloud, which uses the geometric information and density information of point cloud to construct a new global description sub-ISC. The basic ability of scene recognition is to be able to effectively describe map information, so the efficiency and robustness of map description is crucial, which can increase the robot's ability to understand the environment, and the combination of geometric information and semantic information in the form of description will be a possible trend in the future.

2.3 Backend optimization

How to efficiently correct the motion posture and improve the accuracy of the map through back-end optimization is also one of the hot topics of research. C-T SLAM [58] is a hierarchical, continuous-time 3D lidar SLAM algorithm that uses efficient local maps and hierarchical optimization backends to allow optimization correction during real-time map building. BALM introduced the greatly successful BA framework in the field of visual SLAM into the Laser Drawing Module to reduce the cumulative error in the drawing process.

The [52] back-end optimization of LIO-SAM uses factor graph optimization to integrate laser odometer factor, IMU pre-integration factor, GPS factor and loopback factor to obtain a more robust motion estimation. The existing back-end optimization is based on the correct data association, otherwise it will produce incorrect map results, which will limit the usability of back-end optimization, so it can automatically identify and delete the wrong associated data, and realize the robust back-end optimization method of building a correct map even if there is an incorrect constraint data.

2.4 Dynamic objects

Achieving accurate positioning in a dynamically changing environment is a difficult task, and most laser SLAM methods assume that the environment is static, because these methods rely on the measurements provided by lasers, and they are difficult to deal with the problem of the geometric degradation of the environment caused by the violent movement of objects. Therefore, how to achieve robust posture estimation in high-speed motion scenes is also one of the hot topics of research.

SuMa + [60] based on the SUMA framework proposed, the use of point-by-point semantic labels in each frame of the point cloud to build a global semantic map, the use of RangeNet + to reliably filter out the dynamic objects in the scene, improve the accuracy of map construction. LIO-LiDAR [61] combines laser inertial odometer (LIO) and LiDAR global positioning module into a posture optimization framework, using the complementarity of the two to solve the precise problem of long-term positioning in dynamic scenes.

LIOM considers the geometric degradation of dynamic objects in the process of environment map construction, which is not conducive to the construction of large-scale environments, and designs the dynamic object detection module to detect and remove point clouds generated by dynamic object disturbances every frame [50]. But LIOM processes all measurements at once, so it can't achieve high real-time performance.

2.5 Combining deep learning

To efficiently utilize all available scan data, deep learning-based approaches offer a potential solution for estimating the robot's pose directly from lidar data. Similar methods have been successfully applied to camera data and have proven promising results [62]. In the field of learning-based robot attitude estimation using lidar data, it has gradually become a trend to use deep learning to solve SLAM problems, and researchers have begun to use deep learning methods to improve the key modules in the 3D lidar SLAM framework, such as LO-Net [63], DMLO [64], DeepLO [65] and other feature matching estimation based on deep learning frameworks for laser odometer tasks. The DeepICP [66] end-to-end 3D point cloud registration deep learning framework proposed by the Baidu team takes into account the interference of dynamic objects and makes full use of the significant characteristics of static objects, so as to achieve high robustness. The RangeNet++ network is also used in SUMA++[60] to filter out dynamic objects in the scene.

In the closed-loop detection module, OverlapNet [67] is designed by using deep neural network, which is based on the closure of different clue search loops of LiDAR data, the information search circuit closure of different dimensions of LiDAR data, and the odometer results are combined with information such as range, normal vector, intensity and semantics to define the overlap rate, and the closed-loop results are detected and corrected.

In addition to the above research hotspots, the optimization of the point cloud matching algorithm is also constantly innovating, such as Fossel et al. [68] proposed a LiDAR SLAM front end, called NOctoSLAM. The scheme uses point-and-surface ICP for framemap pose matching, and octree instead of traditional kd-tree to build maps, which can quickly find data associations and calculate approximate normal vectors. Compared with the traditional kd-tree map-based point-polygon ICP algorithm, the efficiency is nearly twice as high.

Literature [69-70] Considering the characteristics of indoor more structured feature environment than outdoor, this paper proposes a laser inertial odometer method and a point cloud plane feature extraction method for indoor environment. In addition, the researchers also took into account the long-term repetitive positioning of maps, such as Egger et al. [71] based on 3D point cloud features proposed a new point cloud map representation Method PoseMap, to ensure the reliable long-term positioning of robots in a dynamic environment, the new map representation method allows maps to be updated and expanded online.

3. Algorithm evaluation test

3.1 Datasets

Research and application of 3D lidar SLAM mobile carriers are unmanned vehicles, unmanned ships and unmanned aerial vehicles, and the current laser point cloud data set is mainly for the automatic driving scenario, the data collection work in the outdoor scene is large and cumbersome, involving time synchronization, coordinate calibration and calibration between various sensors, etc., the open data set saves the data preparation of the algorithm research, and the sequence and reference framework provided are also conducive to the development of the algorithm.

At present, the laser point cloud datasets published in the field are: KITTI dataset [72-73], which is currently the largest evaluation dataset in the international autonomous driving scenario and the most commonly used dataset in academia; Waymo Dataset[74], Data Open Project of Waymo, an autonomous driving company; PandaSet data set[75], the capture scenario is located in San Francisco, to develop safe and controllable autonomous driving technology in complex environments and extreme weather. Oxford Robotcar dataset[76], a public dataset proposed by the Robotics Laboratory of the University of Oxford;

USVInland dataset[77], modeled on the KITTI dataset model, researchers from Tsinghua University and Northwestern Polytechnical University and Oka Zhigang jointly published a set of multi-sensor datasets for unmanned ships in inland waterways, which is also the world's first unmanned ship dataset containing laser point cloud data in inland river scenarios. The laser point cloud public data set for the UAV platform has not yet appeared, and academic research is still mostly self-collecting data.

3.2 Algorithm performance evaluation criteria

In recent years, academic research on 3D LiDAR SLAM algorithms has been carried out almost exclusively on the KITTI dataset [47,51-52,55,57-67], which was captured while driving cars in a variety of road environments with dynamic objects and vegetation, such as highways, country roads, and urban areas, with travel speeds of up to 90 km/h. The point cloud captured by the on-board Velodyne HDL-64ES2 has been deflected, and the packets also contain IMU, GPS, and image data.

Thus, the dataset is allowed to evaluate any trajectory obtained by the 3D LiDAR SLAM method, which can be fed directly into the proposed method for evaluation comparison. The performance of the proposed algorithm is generally evaluated based on the following three criteria:

1) KitTI official standards [72-73]:

The KITTI odometer benchmark includes a total of 22 sequences of data, with only 11 (serial number 00-10) with real tracks officially provided, and 11 (serial number 11-21) data sets without real tracks for KITTI official evaluation comparison. The evaluation method is to calculate the translation error and rotation error of different lengths (100 to 800 m) in the data set relative to the real trajectory, and calculate the average.

2) Accuracy index:

Relative pose error (RPE)[78] is used to describe the accuracy of the two frames of posture difference with a certain time difference, that is, after aligning with a timestamp, the change of the real posture and the estimated posture is calculated at intervals of the same time, and then the difference between the two is made, so as to obtain the relative posture error, and then the relative posture error of each period can be calculated by rms error RMSE to obtain the overall value; Absolute trajectory error (ATE)[78] describes the direct difference between the estimated pose and the real pose, which can very intuitively reflect the accuracy of the algorithm and the global consistency of the trajectory.

3) Algorithm time consuming and processing frame rate:

As an indicator of computational efficiency, the time spent processing the sequence of KITTI mileage data sets is counted and the frame rate is calculated, which can intuitively reflect the computational load and real-time performance.

In addition to public datasets such as the KITTI dataset, data packets recorded by multi-line lidar can be used to evaluate the performance of the laser SLAM algorithm based on accuracy indicators.

3.3 Algorithm performance evaluation experiments

In this paper, six currently open source 3D lidar SLAM algorithms are selected for testing and evaluation, as shown in Table 1. We applied it to the robot operating system (ROS), where all algorithms were evaluated and compared in experiments based on kitti's public dataset benchmarks. In order to evaluate the performance of the tested algorithm, the experimental results are collected under the same conditions and the performance measurement is performed.

Table 1 Six open source 3D lidar SLAM algorithms selected

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

We enter the data with serial number 07 into each algorithm to obtain the constructed point cloud map. Figure 5 shows the point cloud map built by the representative 5 algorithms, the point cloud map built by the SUMA algorithm is basically the same, you can see that the difference in the performance of the algorithm cannot be judged by the map construction effect alone, only in Figure 5(e) The Cartographer algorithm is slightly inferior in the construction result, the circle A in Figure 5(e) represents different perspectives of the same part, and does not form a closed loop, so we need to use 3. The algorithm performance evaluation criteria described in Section 2 are evaluated.

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm
10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 5 The algorithm outputs a point cloud map of the 07 series data

1) Kitti official standards

Table 2 lists the results of the data measurements for serial numbers 00, 05, and 07 in the KITTI dataset, from litamin2 [79] And the experimental equipment used by its authors is the Intel Corei9-9900K with 32 GB of RAM and the NVIDIA GeForce RTX 2080 Ti desktop.

KITTI also officially provides a precision evaluation tool KITTI_odometry_evaluation_tool specifically for data translation errors and rotation errors, and the evaluation results are shown in Figure 6.

2) Accuracy index

This paper uses the evo evaluation tool to test experiments, mainly through the accuracy index RPE and ATE to evaluate the performance of the algorithm. The experimental hardware devices are Intel Core i7-10700 and NVIDIA GeForce RTX 3070 desktops with 16 GB of RAM, with a system environment based on Ubuntu18. 04 Ros system for operating systems. Data from the KITTI dataset with serial numbers 00, 05, and 07 was selected for test evaluation of the algorithm.

Table 2 KITTI statistics of six open source 3D lidar SLAM algorithms selected

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm
10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 6 KITTI_odometry_evaluation_tool translation error and rotation

As shown in Figure 7, the trajectory and pose results of the final output of the five algorithms of 00 series data are compared, where the KITTI_00_gt dashed line represents the trajectory truth value of the sequence. From the y direction in Figure 7(a), it can be intuitively concluded that the output results of the hdl_graph_slam and LIO-SAM algorithms are closest to the real trajectory, indicating that the algorithm has a more accurate estimated trajectory than other algorithms.

As can be seen in Figure 7(b), the output of each algorithm is different from the true value of the attitude angle, but it can also be seen from the pitch angle that the output of the hdl_graph_slam and LIO-SAM algorithms is closest to the true value. As shown in Figure 8, the LeGO-LOAM algorithm obtained by the evo tool outputs the ATE and RPE results of the trajectory translation error, and the APE in the figure is equivalent to ATE. The rmse data of ate and RPE of atE and RPE of five algorithms except the Cartographer algorithm 00 sequence are counted in Table 3, and the data in the table also reflect the superiority of the LIO-SAM algorithm over other algorithms in terms of trajectory error.

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm
10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm
10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 7 The evo tool is used to compare the output trajectories of the algorithm

3) Algorithm time consuming and processing frame rate

The test experiment in this article counts the time consumption and processing frame rate of 5 algorithms, table 4 data are the experimental statistics of this article, intel core i7-10700 and NVIDIA GeForceRTX3070 with 16 GB RAM hardware devices.

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm
10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 8 The Evo tool is used for the evaluation results of ate and RPE of the algorithm

Table 3 Root mean square (rmse) data of five algorithms ATE and PRE of the 00 sequence

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Table 4 Time-consuming and processing frame rate statistics based on each algorithm in the KITTI dataset

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Table 4 data can be seen that the Lego-LOAM algorithm performs best in terms of time consumption and processing frame rate, followed by suma algorithm, LIO-SAM algorithm also has extraordinary performance, LOAM and hdl_graph_slam two algorithms perform averagely, and Cartographer algorithm has a large gap in real-time compared to other algorithms, which is not a good choice under the requirements of real-time mapping.

In this test experiment, in addition to the Cartographer algorithm, other algorithms have achieved real-time performance, and can even accelerate processing. According to the test data in Table 3 and Table 4, although Lego-LOAM optimizes the LOAM algorithm framework to achieve computational lightweighting, which greatly improves the computational efficiency of the algorithm, it lacks accuracy. In the selected algorithm, the LIO-SAM algorithm shows superiority in terms of accuracy, and the real-time performance is also guaranteed, and the performance is superior in the selected algorithm. Although the Cartographer algorithm is superior in indoor 2D mapping, it is difficult to achieve 3D mapping in a large-scale outdoor environment.

4. Analysis of technology application trend

The vigorous development of 3D lidar SLAM technology in recent years is obvious to all, and its core topics are synchronous positioning and map creation. We use Citespace to map scientific knowledge for 3D lidar SLAM technology applications to study technology application trends. We retrieved the results of the keyword co-occurrence analysis of the literature from the CNKI Academic Journal Library and the WOS Core Collection, and Figure 9 shows the results of the joint emergence analysis of the keywords of the searched literature, for the papers from 2000 to 2021, of which cnKI retrieved the subject word "3D LiDAR SLAM" and the WOS retrieved the subject word was "3D Lidar SLAM", and then the literature was screened for field relevance, and finally obtained 207, which is closely related to the application field of 3D LiDAR technology articles Chinese and 346 articles in English. From the co-occurrence results, it can be seen that the current application fields mainly include the field of mobile robots and the field of surveying and mapping.

Mobile robot field: At present, service robots have become a popular application scenario of 3D lidar SLAM technology, mainly logistics distribution robots, tour guide robots, inspection robots, etc. In principle, driverless cars also belong to mobile robots, the current equipment competition in the field of unmanned driving is becoming more and more hot, with the improvement of urban Internet of Things and intelligent systems, unmanned driving is the trend of the times. Unmanned ships and unmanned aerial vehicles can avoid obstacles and path planning, and the realization of autonomous intelligence also requires the support of SLAM technology. Aiming at the long image building period and large computing power demand of a single platform lidar, it is the trend of future development to study the multi-vehicle collaborative map building method, increase the reuse of maps between platforms, improve the efficiency of map construction, and achieve payload balancing.

Field of surveying and mapping: efficient real-time acquisition of 3D fine model, rapid three-dimensional reconstruction of the target area is the core topic of surveying and mapping, 3D lidar SLAM technology makes real-time rapid reconstruction possible, has been successfully applied to a variety of mobile mapping platforms, such as mobile backpack SLAM, handheld SLAM scanning system, cart SLAM, unmanned SLAM, unmanned ship SLAM, etc., application scenarios include indoor mapping, mine surveying, forestry surveying, site surveying, offshore platform mapping, etc.

Due to the high precision of surveying and mapping requirements, most mobile mapping platforms cannot allocate more computing resources for back-end optimization and construction, and generally use offline processing or upload real-time data to high-performance servers for calculation, so as to achieve high-precision reconstruction results. Under the premise of ensuring accuracy, how to improve the optimization algorithm to improve the data processing speed to achieve real-time, and how to overcome the mismatch between a large number of similar scenarios, is still the direction and difficulty of future exploration.

10,000 words to explain the autonomous driving 3D LiDAR SLAM algorithm

Figure 9 Joint emergence analysis of Chinese keywords

3D lidar SLAM algorithm in the traditional key modules still need to achieve better lightweight, accuracy, robustness and generalization, semantic map and deep learning integration has become a trend, and other can achieve self-positioning source of sensors such as depth camera, millimeter wave radar, etc. multi-source fusion is also the current research hotspot, 3D lidar SLAM technology on the unmanned platform to achieve independent intelligence of the development will have a profound impact.

This paper tests the open source algorithm according to the algorithm performance evaluation criteria of the combing, and preliminarily draws the evaluation conclusion, and due to the open source problems of the algorithm, such as the original project of the LOAM algorithm is not open source, and the hdl_graph_slam and LIO-SAM algorithm are integrated into the GPS data, limited by the experimental conditions and the input of the open source algorithm, this paper only uses point cloud and IMU data for testing, and the experiment itself has certain defects, and it is impossible to conduct more in-depth test evaluation of each algorithm. In the future work we will try to integrate multi-source data for more comprehensive and in-depth research.

bibliography

[1 ]SMITH R,SELF M, CHEESEMAN P.Estimating uncertain spatial relationships in robotics [ C ].Proceedings of Conference on Uncertainty in Artificial Intelligence, 1988.

[2] CADENA C, CARLONE L, CARRILLO H, et al. Past,present, and future of simultaneous localization and mapping: Towords the robust-perception age[J] . IEEETransactions on Robotics, 2016, 32(6) : 1309-1332.

[3] MONTEMERLO M, BECKER J, BHAT S, et al. Junior:The stanford entry in the urban challenge[J] . Journal of Field Robotics, 2008, 25(9) :569-597.

[4] LEVINSON J, ASKELAND J, BECKER J, et al.Towards fully autonomous driving: Systems and algorithms[C] . Intelligent Vehicles Symposium, IEEE,2011.

[5] SHAN T, ENGLOT B, MEYERS D, et al. LIO-SAM:Tightly-coupled lidar inertial odometry via smoothing and mapping[C] . International Conference on IntelligentRobots and Systems (IROS) , IEEE, 2020.

Wang Changhong, Dou Hexuan, Chen Xiaodong, et al. Research progress of unmanned platform SLAM technology[J] . Navigation Positioning and Timing,2019,6(4):12-19. WANG CH H, DOU H X, CHEN X D, et al. Researchprogress in simultaneous localization and mapping forunmanned vehicles [J] . Navigation Positioning andTiming, 2019, 6(4) : 12-19.

[7] HUANG S D, DISSANAYAKE G. Convergence andconsistency analysis for extended Kalman filter basedSLAM [J] . IEEE Transactions on Robotics, 2007,23(5) :1036-1049.

[8] THRUN S, BURGARD W, FOX D. Probabilisticrobotics[M] . Cambridge, USA: MIT Press, 2005.

[9] HAUKE S, MONTIEL J M, ANDREW DAVISON J.Visual SLAM: Why filter? [J] . Image and VisionComputing, 2012, 30(2) :65-77.

[10] LU F, MILIOS E. Globally consistent range scanalignment for environment mapping [J] . AutonomousRobots, 1997, 4 (4) : 333-349.

[11] GUTMANN J S, KONOLIGE K. Incremental mapping of large cyclic environments [ C ] . Proc of IEEE International Symposium on Computational Intelligence in Robotics and Automation. Piscataway, NJ: IEEE Press,

[12] SANTOS J M, PORTUGAL D, ROCHA R P. Anevaluation of 2D SLAM techniques available in robotoperating system [C] . In: Proceedings of IEEE International Symposium on Safety, Security, and Re

Reprinted from the wisdom car expert, the views in the article are only for sharing and exchange, do not represent the position of this public account, such as copyright and other issues, please inform, we will deal with it in a timely manner.

-- END --

Read on