SSD: Single Shot MultiBox Detector, 是一個end to end 的目标檢測識别模型。先小八卦下,它屬于google派系,它的作者也是googlenet的作者。該模型旨在高精度的快速識别, 它不用額外計算bounding box而能達到相當的識别精度,而且速度有極大的提高,号稱可以達到58的FPS 和 72.1%的mAP。
我們先來看下這個模型的全貌。它的最底幾層是一個經典的VGG16的網絡(也可以替換成ResNet),其中的卷積層conv4_3和全連接配接層fc7、 以及再往上的三個卷積層conv6、conv7、conv8,分别分支出mbox_conf, mbox_loc, priorbox三種節點(稱X節點),然後通過對應的concat将來自不同層的X節點進行融合, 最後将concat結果輸出一并進行分類決策。
更詳細的,可以看下面的一次前向計算的代碼輸出。
[INFO 2016-08-30 21:58:39.619143 21429 net.cpp:540] Forwarding data
[INFO 2016-08-30 21:58:39.622481 21429 net.cpp:540] Forwarding data_data_0_split
[INFO 2016-08-30 21:58:39.622514 21429 net.cpp:540] Forwarding conv1_1
[INFO 2016-08-30 21:58:39.627096 21429 net.cpp:540] Forwarding relu1_1
[INFO 2016-08-30 21:58:39.627473 21429 net.cpp:540] Forwarding conv1_2
[INFO 2016-08-30 21:58:39.631721 21429 net.cpp:540] Forwarding relu1_2
[INFO 2016-08-30 21:58:39.631757 21429 net.cpp:540] Forwarding pool1
[INFO 2016-08-30 21:58:39.632096 21429 net.cpp:540] Forwarding conv2_1
[INFO 2016-08-30 21:58:39.634774 21429 net.cpp:540] Forwarding relu2_1
[INFO 2016-08-30 21:58:39.634809 21429 net.cpp:540] Forwarding conv2_2
[INFO 2016-08-30 21:58:39.639045 21429 net.cpp:540] Forwarding relu2_2
[INFO 2016-08-30 21:58:39.639080 21429 net.cpp:540] Forwarding pool2
[INFO 2016-08-30 21:58:39.639394 21429 net.cpp:540] Forwarding conv3_1
[INFO 2016-08-30 21:58:39.642501 21429 net.cpp:540] Forwarding relu3_1
[INFO 2016-08-30 21:58:39.642535 21429 net.cpp:540] Forwarding conv3_2
[INFO 2016-08-30 21:58:39.647202 21429 net.cpp:540] Forwarding relu3_2
[INFO 2016-08-30 21:58:39.647235 21429 net.cpp:540] Forwarding conv3_3
[INFO 2016-08-30 21:58:39.650738 21429 net.cpp:540] Forwarding relu3_3
[INFO 2016-08-30 21:58:39.650770 21429 net.cpp:540] Forwarding pool3
[INFO 2016-08-30 21:58:39.651074 21429 net.cpp:540] Forwarding conv4_1
[INFO 2016-08-30 21:58:39.655285 21429 net.cpp:540] Forwarding relu4_1
[INFO 2016-08-30 21:58:39.655323 21429 net.cpp:540] Forwarding conv4_2
[INFO 2016-08-30 21:58:39.660395 21429 net.cpp:540] Forwarding relu4_2
[INFO 2016-08-30 21:58:39.660429 21429 net.cpp:540] Forwarding conv4_3
[INFO 2016-08-30 21:58:39.665523 21429 net.cpp:540] Forwarding relu4_3
[INFO 2016-08-30 21:58:39.665555 21429 net.cpp:540] Forwarding conv4_3_relu4_3_0_split
[INFO 2016-08-30 21:58:39.665570 21429 net.cpp:540] Forwarding pool4
[INFO 2016-08-30 21:58:39.665881 21429 net.cpp:540] Forwarding conv5_1
[INFO 2016-08-30 21:58:39.668714 21429 net.cpp:540] Forwarding relu5_1
[INFO 2016-08-30 21:58:39.668748 21429 net.cpp:540] Forwarding conv5_2
[INFO 2016-08-30 21:58:39.671761 21429 net.cpp:540] Forwarding relu5_2
[INFO 2016-08-30 21:58:39.671807 21429 net.cpp:540] Forwarding conv5_3
[INFO 2016-08-30 21:58:39.675269 21429 net.cpp:540] Forwarding relu5_3
[INFO 2016-08-30 21:58:39.675302 21429 net.cpp:540] Forwarding pool5
[INFO 2016-08-30 21:58:39.675624 21429 net.cpp:540] Forwarding fc6
[INFO 2016-08-30 21:58:39.685935 21429 net.cpp:540] Forwarding relu6
[INFO 2016-08-30 21:58:39.685971 21429 net.cpp:540] Forwarding fc7
[INFO 2016-08-30 21:58:39.688531 21429 net.cpp:540] Forwarding relu7
[INFO 2016-08-30 21:58:39.688565 21429 net.cpp:540] Forwarding fc7_relu7_0_split
[INFO 2016-08-30 21:58:39.688580 21429 net.cpp:540] Forwarding conv6_1
[INFO 2016-08-30 21:58:39.691439 21429 net.cpp:540] Forwarding conv6_1_relu
[INFO 2016-08-30 21:58:39.691473 21429 net.cpp:540] Forwarding conv6_2
[INFO 2016-08-30 21:58:39.695135 21429 net.cpp:540] Forwarding conv6_2_relu
[INFO 2016-08-30 21:58:39.695169 21429 net.cpp:540] Forwarding conv6_2_conv6_2_relu_0_split
[INFO 2016-08-30 21:58:39.695183 21429 net.cpp:540] Forwarding conv7_1
[INFO 2016-08-30 21:58:39.698765 21429 net.cpp:540] Forwarding conv7_1_relu
[INFO 2016-08-30 21:58:39.698796 21429 net.cpp:540] Forwarding conv7_2
[INFO 2016-08-30 21:58:39.701938 21429 net.cpp:540] Forwarding conv7_2_relu
[INFO 2016-08-30 21:58:39.702193 21429 net.cpp:540] Forwarding conv7_2_conv7_2_relu_0_split
[INFO 2016-08-30 21:58:39.702220 21429 net.cpp:540] Forwarding conv8_1
[INFO 2016-08-30 21:58:39.704677 21429 net.cpp:540] Forwarding conv8_1_relu
[INFO 2016-08-30 21:58:39.704716 21429 net.cpp:540] Forwarding conv8_2
[INFO 2016-08-30 21:58:39.707798 21429 net.cpp:540] Forwarding conv8_2_relu
[INFO 2016-08-30 21:58:39.707839 21429 net.cpp:540] Forwarding conv8_2_conv8_2_relu_0_split
[INFO 2016-08-30 21:58:39.707859 21429 net.cpp:540] Forwarding pool6
[INFO 2016-08-30 21:58:39.707926 21429 net.cpp:540] Forwarding pool6_pool6_0_split
[INFO 2016-08-30 21:58:39.707947 21429 net.cpp:540] Forwarding conv4_3_norm
[INFO 2016-08-30 21:58:39.711788 21429 net.cpp:540] Forwarding conv4_3_norm_conv4_3_norm_0_split
[INFO 2016-08-30 21:58:39.711818 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_loc
[INFO 2016-08-30 21:58:39.714972 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_loc_perm
[INFO 2016-08-30 21:58:39.717313 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_loc_flat
[INFO 2016-08-30 21:58:39.717339 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_conf
[INFO 2016-08-30 21:58:39.724395 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_conf_perm
[INFO 2016-08-30 21:58:39.731096 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_conf_flat
[INFO 2016-08-30 21:58:39.731127 21429 net.cpp:540] Forwarding conv4_3_norm_mbox_priorbox
[INFO 2016-08-30 21:58:39.731290 21429 net.cpp:540] Forwarding fc7_mbox_loc
[INFO 2016-08-30 21:58:39.733963 21429 net.cpp:540] Forwarding fc7_mbox_loc_perm
[INFO 2016-08-30 21:58:39.737503 21429 net.cpp:540] Forwarding fc7_mbox_loc_flat
[INFO 2016-08-30 21:58:39.737527 21429 net.cpp:540] Forwarding fc7_mbox_conf
[INFO 2016-08-30 21:58:39.746902 21429 net.cpp:540] Forwarding fc7_mbox_conf_perm
[INFO 2016-08-30 21:58:39.750918 21429 net.cpp:540] Forwarding fc7_mbox_conf_flat
[INFO 2016-08-30 21:58:39.750946 21429 net.cpp:540] Forwarding fc7_mbox_priorbox
[INFO 2016-08-30 21:58:39.751056 21429 net.cpp:540] Forwarding conv6_2_mbox_loc
[INFO 2016-08-30 21:58:39.753976 21429 net.cpp:540] Forwarding conv6_2_mbox_loc_perm
[INFO 2016-08-30 21:58:39.756206 21429 net.cpp:540] Forwarding conv6_2_mbox_loc_flat
[INFO 2016-08-30 21:58:39.756239 21429 net.cpp:540] Forwarding conv6_2_mbox_conf
[INFO 2016-08-30 21:58:39.763130 21429 net.cpp:540] Forwarding conv6_2_mbox_conf_perm
[INFO 2016-08-30 21:58:39.764664 21429 net.cpp:540] Forwarding conv6_2_mbox_conf_flat
[INFO 2016-08-30 21:58:39.764689 21429 net.cpp:540] Forwarding conv6_2_mbox_priorbox
[INFO 2016-08-30 21:58:39.764760 21429 net.cpp:540] Forwarding conv7_2_mbox_loc
[INFO 2016-08-30 21:58:39.768630 21429 net.cpp:540] Forwarding conv7_2_mbox_loc_perm
[INFO 2016-08-30 21:58:39.772903 21429 net.cpp:540] Forwarding conv7_2_mbox_loc_flat
[INFO 2016-08-30 21:58:39.772927 21429 net.cpp:540] Forwarding conv7_2_mbox_conf
[INFO 2016-08-30 21:58:39.777669 21429 net.cpp:540] Forwarding conv7_2_mbox_conf_perm
[INFO 2016-08-30 21:58:39.781180 21429 net.cpp:540] Forwarding conv7_2_mbox_conf_flat
[INFO 2016-08-30 21:58:39.781205 21429 net.cpp:540] Forwarding conv7_2_mbox_priorbox
[INFO 2016-08-30 21:58:39.781263 21429 net.cpp:540] Forwarding conv8_2_mbox_loc
[INFO 2016-08-30 21:58:39.783634 21429 net.cpp:540] Forwarding conv8_2_mbox_loc_perm
[INFO 2016-08-30 21:58:39.788920 21429 net.cpp:540] Forwarding conv8_2_mbox_loc_flat
[INFO 2016-08-30 21:58:39.788944 21429 net.cpp:540] Forwarding conv8_2_mbox_conf
[INFO 2016-08-30 21:58:39.793294 21429 net.cpp:540] Forwarding conv8_2_mbox_conf_perm
[INFO 2016-08-30 21:58:39.797371 21429 net.cpp:540] Forwarding conv8_2_mbox_conf_flat
[INFO 2016-08-30 21:58:39.797397 21429 net.cpp:540] Forwarding conv8_2_mbox_priorbox
[INFO 2016-08-30 21:58:39.797449 21429 net.cpp:540] Forwarding pool6_mbox_loc
[INFO 2016-08-30 21:58:39.800542 21429 net.cpp:540] Forwarding pool6_mbox_loc_perm
[INFO 2016-08-30 21:58:39.804468 21429 net.cpp:540] Forwarding pool6_mbox_loc_flat
[INFO 2016-08-30 21:58:39.804493 21429 net.cpp:540] Forwarding pool6_mbox_conf
[INFO 2016-08-30 21:58:39.808717 21429 net.cpp:540] Forwarding pool6_mbox_conf_perm
[INFO 2016-08-30 21:58:39.812292 21429 net.cpp:540] Forwarding pool6_mbox_conf_flat
[INFO 2016-08-30 21:58:39.812317 21429 net.cpp:540] Forwarding pool6_mbox_priorbox
[INFO 2016-08-30 21:58:39.812382 21429 net.cpp:540] Forwarding mbox_loc
[INFO 2016-08-30 21:58:39.812604 21429 net.cpp:540] Forwarding mbox_conf
[INFO 2016-08-30 21:58:39.812834 21429 net.cpp:540] Forwarding mbox_priorbox
[INFO 2016-08-30 21:58:39.819844 21429 net.cpp:540] Forwarding mbox_conf_reshape
[INFO 2016-08-30 21:58:39.819871 21429 net.cpp:540] Forwarding mbox_conf_softmax
[INFO 2016-08-30 21:58:39.820596 21429 net.cpp:540] Forwarding mbox_conf_flatten
[INFO 2016-08-30 21:58:39.820647 21429 net.cpp:540] Forwarding detection_out
[INFO 2016-08-30 21:58:39.832866 21429 net.cpp:540] Forwarding detection_eval
SSD 網絡使用了大量的小的卷積核(1x1, 3x3),不僅用于分類而且用于bounding box的位置回歸,通過一些濾波實作不同長寬比的目标檢測,并進而用于在後續的不同feature map下的多尺度的檢測。
SSD設計了一個bounding box集合, 包含4個:長的 、寬的、大正方、小正方,分布在不同尺寸(4x4,8x8)的feature map的每個位置, 即用卷積的方式覆寫了一個m*n*p的feature map的m*n個位置。在訓練時,對這些box與groundtruth box進行比對,即對每個box計算和groundtruth的位移和分類機率,獲得了4個位移值和c個分類機率值,并根據groundtruth的類别獲得TP和FP,最終通過計算權重位置損失和分類置信度損失獲得模型整體損失, 并通過非極大值抑制來獲得最終的檢測結果。
不同形狀的box,及其在多分辨率feature map下的應用,實作了box的參數空間的離散化進而提高計算效率。groudtruth的資訊, 包括類别和位置都需要明确地附給那些網絡輸出,使得損失函數和反向傳播是end to end的。在訓練時,需要将groundtruh和box對應起來,隻要和groundtruth的jaccard覆寫率大于0.5,就能和該groundtruth對應上,每個groundtruh必須至少有一個box與其對應。另外,當候選box數量很多時,往往FP也很多,導緻TP和FP的數量不平衡。于是,根據分類置信度對候選box進行排序,取top個候選使得FP和TP的比例在3:1。
關于如何識别多尺度目标。我們知道,低層的feature map對圖像的細節表達出色進而可以提高語義分割品質,高層的feature map可以平滑分割結果。于是,綜合底層和高層的feature map進行檢測。不同層的feature map有不同的感受野尺寸,這個很關鍵,可以參考Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene cnns. In: ICLR. (2015)。然而不需要給某一層feature map建構不同尺寸的box, 而是某層的feature map隻學習檢測某個尺度的對象,是以某一層的feature map隻有一個尺度的box。舉個例子,在8x8的feature map中的box是無法檢測到尺寸較大的狗的(如下圖)。從低層到高層,box的縮放比均勻地分布在0.2~0.95之間。進一步為了解決長寬比的問題,每層的box又生成了{1,1+,2,3,1/2,1/3} 6個不同長寬比的擴充box。
SSD從某種意義上是結合了RPN和YOLO的思想。即
1)RPN的anchor思想,在feature map上運用256 個 3x3 的濾波器,事實上是在feature map的每個位置,從256個次元來表達9種anchor box特征。濾波器滑動窗的位置提供了相對原圖的定位資訊。回歸框提供了相對該滑動窗的更精細的定位資訊。RPN使得計算降低256倍(即從基于原圖的操作轉為基于特征圖的操作)。
2)YOLO的回歸思想,即用特征回歸出目标的位置和了類别, 而沒有使用ROI pooling進行分類和提取。