天天看點

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

Uderstanding

  • Understanding
    • 三維 understanding
      • 一、Learning only from point cloud
        • (1)PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation(cvpr-2017)
          • 簡介:
          • 貢獻:
          • 算法(貢獻點)核心思路:
          • 算法(貢獻)具體實作流程:
          • 實驗對比&效果:
          • 缺點:
        • (2)PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space(NIPS-2017)
          • 簡介:
          • 貢獻:
          • 算法(貢獻點)核心思路:
          • 算法(貢獻)具體實作流程:
          • 實驗對比&效果:
          • 缺點:
        • (3)2020綜述 Deep Learning for 3D Point Clouds: A Survey
          • 簡介:
          • 貢獻:
          • 3D Shape Classification:
          • 3D Object detection and tracking:
            • 一、3D Object Detection
            • **(4)Other Methods**
            • **2.Single Shot Methods**
            • **二、3D Object Tracking**
            • **三、3D scene flow estimation**
            • Summary
          • 3D Point Cloud Segmentation:
            • **2) Instance Segmentation**
      • 2D-3D joint learning
        • 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans(CVPR-2019)
          • 簡介:
          • 貢獻:
          • 算法(貢獻點)核心思路:
          • 算法(貢獻)具體實作流程:
          • 實驗對比&效果:
          • 缺點:

Understanding

三維 understanding

一、Learning only from point cloud

(1)PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation(cvpr-2017)

homepage.paper. 作者中文講解視訊. code.

簡介:

之前在三維檢測上的工作:

(1)将不規則的點雲資料轉換為規則的體素資料,然後在體素資料上進行CNN,大量的耗費計算資源。

(2)人工在三維資料中進行特征的提取和計算。(依賴人工的水準,魯棒性低)

(3)将三維資料投影在二維的空間平面中,然後利用二維的空間檢測方法來完成檢測,(會損失部分重要的三維資訊)

本文開創性的提出了一種隻應用最原始的三維點雲資料進行深度學習的方法(神經網絡),實作了三維檢測的end-to-end的檢測,更多的挖掘資料中的資訊。

本文可以粗略的完成對三維物體的檢測、分類和分割。

貢獻:

• We design a novel deep net architecture suitable for consuming unordered point sets in 3D;

• We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks;

• We provide thorough empirical and theoretical analysis on the stability and efficiency of our method;

• We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance.

• 對點的丢失具有較高的魯棒性

算法(貢獻點)核心思路:

1.特征提取:将三維的點映射到更高維的空間(MLP),然後再通過對稱性的操作(max pooling)(因為是高維空間是以避免了資訊的丢失,同時完成了置換不變性),然後再通過一個函數γ(MLP)來得到output scores。

2.該網絡對于視角的變換不變性的解決:(基于資料本身的變換函數子產品)

将輸入的n3的點,經過T-Net網絡,生産變換參數,再由變換函數得到另一組變換後的n3的點雲,最終解決不同視角的問題。?

整體相當于優化點,優化點的特征,便于後面的網絡進行處理。

算法(貢獻)具體實作流程:
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

1.輸入n3的點,經過變換網絡生産新的,優化過的n3的點。

2.通過mlp将每個點,映射到更高維的空間,得到相關的特征向量。

3.再通過變換網絡,對相關特征進行更歸一化的特征,便于後續的網絡學習。

4.繼續做MLP,将64維的特征映射到更高維的1024維的特征。

5.再通過max-pooling進行對稱性操作,得到全局特征。

6.再通過級聯的全連接配接網絡得到,output scores,進行K哥class的分類。

7.分割問題:相當于對每個點的分類,将局部特征和全局特征進行全局學習(或者說單個特征再全局特征中的檢索,看自己屬于哪個class),分别得到每個點的局部類别,進而将同屬于同一類别的點提取出,便實作了分割。

8.再通過MLP,生成n*m(m個類别)

實驗對比&效果:
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

本文雖然是最早的對3D點雲進行處理的深度學習網絡,但是結果已經和當時較為成熟的在體素上進行RCNN的網絡相當。

缺點:

對局部特征學習不夠

(2)PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space(NIPS-2017)

homepage.paper. 作者中文講解視訊. code.

簡介:

Pointnet++是作者自己對Pointnet的更新版, 該網絡設計了多級點雲特征提取的方案,解決了Pointnet對局部特征提取不足的缺點, 使得該網絡的學習效果更好;同時該網絡在同一級上結合了不同尺度上的學習特征,使得了該網絡具有較強的魯棒性(更好的解決了點丢失時對分類結果的影響)。

貢獻:

1.實作對在全局中對局部點雲資訊進行單獨處理,再經過對多個不同的局部點雲的提取,再形式上實作了類似于CNN的形式,使得該網絡更好的識别物體和分割物體。

2.能夠實作hierarchical feature learning(多級點雲特征學習)、解決translation invariant 和permutation invariant。(解決了平移不變性和置換不變性)

3.可以在同一級上金和不同尺度上的學習特征,使得該網絡在面對點的資料部分丢失時由更好的魯棒性。

算法(貢獻點)核心思路:

對每個局部小區域中的點進行pointnet,然後再擴大區域再進行pointnet,實作多級點雲特征的學習,使得學習效果更好。

算法(貢獻)具體實作流程:
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

通過在不同尺度,不同範圍内進行不斷的PointNet++,使得對點雲資料的學習在形式上更像傳統的CNN一樣,使得檢測學習的精度更好,效果更好。

實驗對比&效果:
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding
缺點:

這是初代的對點雲原始資料上進行深度學習的方法,學習的效果還需要大幅提升。

(3)2020綜述 Deep Learning for 3D Point Clouds: A Survey

homepage.paper. 作者持續更新

DOI: 10.1109/TPAMI.2020.3005434

簡介:
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

本文是彙總了現有用深度學習來處理點雲資料的所有方法。其中包含三大類别:

(1)3D Shape Classification

(2)3D Object Detection and Tracking

(3)3D Ponit Cloud Segementation

貢獻:
  1. To the best of our knowledge, this is the first survey paper to comprehensively cover deep learning methods for several important point cloud under- standing tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation.
  2. As opposed to existing reviews [19], [20], we specifically focus on deep learning methods for 3D point clouds rather than all types of 3D data.
  3. This paper covers the most recent and advanced progresses of deep learning on point clouds. Therefore, it provides the readers with the state-of-the-art methods.
  4. Comprehensive comparisons of existing methods on several publicly available datasets are provided (e.g., in Tables 2, 3, 4, 5), with brief summaries and insightful discussions being presented.
3D Shape Classification:

現有一共三種方法:

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding
  1. Multi-view based methods project an unstructured point cloud into 2D images,先将3D點雲投影到多個二維視圖中,然後在二維的圖像中進行學習。
  2. volumetric-based methods convert a point cloud into a 3D volumetric representation.然後再三維空間中進行三維的CNN。
  3. Point-based Networks directly work on raw point clouds。

    (1) Pointwise MLP Methods

    小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

(2)Convolution-based Methods

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

(3)Graph-based Methods

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

(4) Hierarchical Data Structure-based Methods

(5)Other Methods

Summary:

  1. Pointwise MLP networks are usually served as the basic building block for other types of networks to learn pointwise features.(逐點MLP通常是其他網路的基礎)
  2. As a standard deep learning architecture, convolution-based networks can achieve superior

    performance on irregular 3D point clouds. More attention should be paid to both discrete and continuous convolution networks for irregular data.(基于卷積的網絡)

  3. Due to its inherent strong capability to handle irregular data, graph-based networks have attracted increasingly more attention in recent years. However, it is still challenging to extend graph-based networks in the spectral domain to various graph structures. (圖神經網絡)
3D Object detection and tracking:

一、3D Object Detection

3D object detection methods can be divided into two categories: region proposal based and single shot methods. (3維檢測也是分為兩大類,基于 region proposal 和single shot的方法)

Several milestone methods are presented

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

1.Region Proposal-based Methods

These methods first propose several possible regions (also called proposals) containing objects, and then extract region wise features to determine the category label of each proposal. (先根據特征向量生成proposals,然後再在每個region中根據其特征向量對其設别其類别。)

According to their object proposal generation approach, these methods can further be divided into three categories: multi-view based, segmentation based and frustum-based methods.(根據proposal的生成方式不同,我們又将其分為基于視圖的方法、基于分割的方法和基于視錐的方法)

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

(1)Multi-view based Methods

該方法根據不同的視圖(例如:LiDAR front view, Bird’s Eye View (BEV), and image等)來生成proposal,該方法的計算成本通常比較高。

該方法的最經典算法是2017年清華的MV3D,該方法首先在BEV視圖中進行高精度的2DCNN,生成相應的ROI,然後将該ROI投影到其他視圖中,再結合不同視圖中的proposla生成3D bounding box,雖然該方法的精度非常高了,但是其運算速度特别的慢,該領域内接下來的工作便是從兩個方向改進該方法(多是提高檢測速度。)

First, several methods have been proposed to efficiently fuse the information of different modalities.Second, different methods have been investigated to extract robust representations of the input data.

典型算法及文章:

(1)MV3D(Multi-view 3D object detection network for autonomous driving,)、

(2)Joint 3D proposal generation and object detection from view aggregation、

(3)Deep continuous fusion for multi-sensor 3D object detection、

Multi-task multi-sensor fusion for 3D object detection、

(4)PIXOR: Real-time 3D object detection from point clouds、

(5)Fast and furious: Real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net、

(6)RT3D: Real-time 3D vehicle detection in lidar point cloud for autonomous driving、

(2)Segmentation-based Methods

These methods first leverage existing semantic segmentation techniques to remove most background points, and then generate a large amount of high quality proposals on foreground points to save computation,(先篩選前景點和背景點,然後删除背景點,在前景點中生成proposal)

these methods achieve higher object recall rates and are more suitable for complicated scenes with highly occluded and crowded objects。

典型網絡:

(1)PonitRCNN。 Specifically, they directly segmented 3D point clouds to obtain foreground points and then fused semantic features and local spatial

features to produce high-quality 3D boxes.

(2)PonitRGCN在pointRCNN中的RPN之後,利用GCN(圖卷積神經網絡)來進行三維物體檢測.

(3)STD: Sparse-to-dense 3D object detector for point cloud中用一種球形anchor來關聯每個點,然後使用每個點的語義評分來删除多餘的錨點,使得該網絡有更高的召回率。

(3)Frustum-based Methods

These methods first leverage existing 2D object detectors to generate 2D candidate regions of objects and then extract a 3D frustum proposal for each 2D candidate region (現在2維圖像中進行檢測,然後根據二維圖像檢測的proposal和相機的外參,生成三維的proposal)(該方法依賴二維的檢測精度,并且不能很好的解決物體遮擋問題)

經典網絡:

  1. F-PointNet是該方法的開創性工作。(It generates a frustum proposal for each 2D region and applies PointNet (or PointNet++ ) to learn point cloud features of each 3D frustum for amodal 3D box estimation.)
  2. Point-SENet:which were further used to adaptively highlight useful features and suppress informative-less features. (能夠自适應的突出有用的特征,并抑制資訊量少的特征)
  3. PointSIFT:強化了捕獲點雲的方向資訊,進而獲得了強大的形狀縮放魯棒性,檢測性能上有了進一步提高。
  4. PointFusion:
  5. RoarNet:
  6. Frustum convNet:
  7. Patch refinement - localized 3D object detection:

(4)Other Methods

  1. 3D IoU loss

    integrated the IoU of two 3D rotated bounding boxes into several state-of-the-art detectors to achieve consistent performance improvement

  2. Fast Point R-CNN :

    proposed a two-stage network architecture to use both point cloud and voxel representations,(用更快的速度同時檢測精度達到了PointRCNN,)

  3. PV-RCNN :

    kitti檢測汽車榜第一,

  4. VoteNet:

2.Single Shot Methods

These methods directly predict class probabilities and regress 3D bounding boxes of objects using a single-stage network. They do not need region proposal generation and post-processing. As a result, they can run at a high speed.(運作速度快)

According to the type of input data, single shot methods can be divided into three categories: BEV-based, discretization based and point-based methods.

(1)BEV-based Methods.

These methods mainly take BEV representation as their input.

(2)Discretization-based Methods.

These methods convert a point cloud into a regular discrete representation, and then apply CNN to predict both categories and 3D boxes of objects.

(3)Point-based Methods:

These methods directly take raw point clouds as their inputs.

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

二、3D Object Tracking

Given the locations of an object in the first frame, the task of object tracking is to estimate its state in subsequent frames . Since 3D object tracking can use the rich geometric information in point clouds, it is expected to overcome several drawbacks faced by image-based tracking, including occlusion, illumination and scale variation.

(同過檢測到第一幀的狀态來預測物體接下來的狀态,由于三維空間有豐富的幾何資訊,可能解決二維物體跟蹤的一些缺陷)

三、3D scene flow estimation

所謂場景流,就是光流的三維版本,表述了圖像/點雲中每個點在前後兩幀的變化情況。目前對場景流的研究還局限在實驗室階段,由于缺乏實際資料(打标成本太高)以及客觀的評價名額,離工程應用還有不小的距離。此外,巨大的計算量也是一個瓶頸。

二維的光流估計是視訊了解的主要方式之一,

光流:具體是指,視訊圖像的一幀中的代表同一對象(物體)像素點移動到下一幀的移動量,使用二維向量表示。光流是一種描述像素随時間在圖像之間運動的方法。

二維中,深度學習光流估計算法将以FlowNet/FlowNet2.0為例介紹。

三維場流估計有一些介紹文章 https://zhuanlan.zhihu.com/p/85663856

現階段可以參考的文章:

(1)FlowNet3D

(2)FlowNet3D++

(3)HPLFlowNet

(4)PointRNN

(5)MeteorNet

(6)Just go with the flow

Summary

(1)基于proposal的方法現階段更常用,且效果更好。

(2)3D object detector有兩個缺點:對遠處的物體識别能力較差, 和并沒有充分的運用圖像中的紋理資訊。

(3)多任務聯合學習是未來發展的一個方向。

(4) 3D object tracking and scene flow estimation are emerging research topics

3D Point Cloud Segmentation:

According to the segmentation granularity, 3D point cloud segmentation methods can be classified into three categories: semantic segmentation (scene level), instance segmentation (object level) and part segmentation (part level).

1) 3D Semantic Segmentation

語義分割是将同一類物體分割出來。

there are four paradigms for semantic segmentation: projection-based, discretization-based, point-based, and hybrid methods.(四種:基于投影,基于離散化,基于點的和混合式)

基于投影和離散化的方法的第一步都是将不規則的點雲資料轉化為規則的表達方式,而基于點的方法則是直接在原始的點雲上進行處理的。

  1. Projection-based Methods

    These methods usually project a 3D point cloud into 2D images, including multi-view and spherical images.

    (1) Multi-view Representation

    該方法一般在原始的三維點雲的基礎上,選擇幾個虛拟的照相機攝影點,然後将三維點雲分别投影到多個二維平面中,然後在二維平面中進行學習,然後再融合不同視圖的重投影分數,得到每個點的最終語義标簽。

    Overall, the performance of multi-view segmen- tation methods is sensitive to viewpoint selection and occlusions. Besides, these methods have not fully exploited the underlying geometric and structural information, as the projection step inevitably introduces information loss.

    (2)Spherical Representation

    暫未讀懂。

  2. Discretization-based Methods

    These methods usually convert a point cloud into a dense/sparse discrete representation, such as volumetric and sparse permutohedral lattices.

    (1)Dense Discretization Representation

    Early methods usually voxelized the point clouds as dense grids and then leverage the standard 3D convolutions. 後來融入了全卷積等方法,使得效果明顯提高。

    該方法由于将點雲轉化為體素後,可以使用标準的三維卷積,是以檢測精度上較為理想,但體素化的步驟,從原理上必然帶來了部分幾何資訊的丢失。通常,搞分辨率會帶來巨大的計算量和使用巨大的存儲空間,而較小的分辨率,又會導緻部分細節的丢失。

    (2)Sparse Discretization Representation

    稀疏離散化表示,暫未看懂,大緻意思式再上述體素的表示基礎上進行稀疏化,進而縮減了計算量和存儲空間。

  3. Hybrid Methods

    将二維圖像和三維資料進行聯合學習。

  4. Point-based Methods

    Point-based networks directly work on irregular point clouds. However, point clouds are orderless and unstructured, making it infeasible to directly apply standard CNNs. To this end, the pioneering work PointNet is proposed to learn per-point features using shared MLPs and global features using symmetrical pooling functions. Based on PointNet, a series of point-based networks have been proposed recently . Overall, these methods can be roughly divided into pointwise MLP methods, point convolution methods, RNN- based methods, and graph-based methods.

    (1)Pointwise MLP Methods

    再PointNet中首先提出的應用MLP來提取資訊, 再次基礎上擴充的方法。

    (2)Point Convolution Methods.

    這些方法傾向于為點雲提出有效的卷積運算。

    (3)RNN-based Methods

    為了從點雲中捕獲固有的上下文特征,應用遞歸神經網絡(RNN)

    (4)Graph-based Methods

    應用圖神經網絡結構

    小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

2) Instance Segmentation

在将物體的類别分出來的基礎上,還要具體的知道他式輸入人A還是人B或是人C

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

(1)Proposal-based Methods

These methods convert the instance segmentation problem into two sub-tasks: 3D object detection and instance mask prediction.

(2)Proposal-free Methods

they usually consider instance segmentation as a subsequent clustering step after semantic segmentation.

開創性工作:SGPN

2D-3D joint learning

3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans(CVPR-2019)

homepage.paper. 視訊.

簡介:

This paper introduces a new neural network structure (3D-SIS). The network proposes 2D-3D joint learning for the first time. It also learns from geometry and RGB to improve the effect of instance segmentation; at the same time, the network is fully convolutional end-to-end Network, so it can run efficiently in a large three-dimensional environment.

input:(1)3D scan geometry features(2)2D RGB input features

output:(1)3D object bounding boxes (2)class labels(3)instance masks

小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding
貢獻:

This paper introduce 3D-SIS, a new approach for 3D semantic instance segmentation of RGB-D scans, which is trained in an end-to-end fashion to detect object instances and jointly learn features from RGB and geometry data.

算法(貢獻點)核心思路:

The core idea of the method is to jointly learn features from RGB and geometry data using multi-view RGB-D input recorded with commodity RGB-D sensors, thus enabling accurate instance predictions.

算法(貢獻)具體實作流程:
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding

一、網絡輸入資料預處理:

(1)通過bundle fusion獲得該空間的幾何資訊(用體素來表示,TSDF)

(2)将盡可能覆寫該環境内的所有2為圖像取出,并通過二維卷積獲得其二維特征,再将其反射投影到該室内環境的三維體素中,每個體素中帶有的顔色特征。

二、通過3D Detection Backbone 進行三維物體檢測

(1)分别對3D Geomentry 和3D Color Features 進行三維卷積,然後再将其三維卷積得到的特征進行融合。

(2)對(1)中的融合結果通過anchor進行3DRPN生成三維物體的Box Location

(3)結合(2)中具體的三維物體的box和box中具體的三維特征,進行3DROI分别得到每個3D box中每個3維物體的類别,實作檢測分類功能。

三、通過3D Mask Backbone 給檢測出來的每個三維物體打上mask。

實驗對比&效果:
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding
小白學習三維檢測和三維語義了解(3D-understanding)持續更新中。。Understanding
缺點:

(1)輸入複雜,有三維和二維兩種形式

(2)大量使用三維卷積,訓練時間和檢測時間長,不一定能夠完成實時任務

(3)網絡結構不夠緊湊,分别對兩種輸入進行了兩種3D CNN,部分結構理論上可能可以共享。

繼續閱讀