R-CNN detection（RCNN检测）

本文来源：caffe官方文档翻译，地址：RCNN检测

R-CNN是一种最先进的探测器，它可以通过一个finetuned Caffe模型对区域提案进行分类。关于R-CNN系统和模型的完整细节，请参考其项目现场和论文：

丰富的特征层次结构，用于精确的目标检测和语义分割。罗斯·格希克，杰夫·多纳休，特雷弗·达雷尔，吉滕德拉·马利克。CVPR 2014。Arxiv 2013。

注：论文地址：Arxiv

在本例中，我们使用ImageNet的R-CNN模型的完全caffe版本进行检测。R-CNN检测器输出ILSVRC13的200个检测类的类分数。请记住，这些是原始的一个对比所有的SVM得分，因此它们没有经过概率校准，也没有在不同的班级之间进行精确的比较。注意，这个现成的模型只是为了方便，并不是完整的R-CNN模型。

让我们对一张图片进行检测，这是一张在沙漠中骑鱼自行车的人的图片(来自ImageNet challenge ，并非开玩笑)。首先，我们需要 region proposals 和Caffe R-CNN ImageNet模型：

选择性搜索是R-CNN使用的region proposer（区域建议者）。带有python模块的 selective search ijcv with python负责通过选择性搜索MATLAB实现抽取建议。安装它，要下载模块并将其目录名为selective_search_ijcv_with_python，在MATLAB中运行演示程序来编译必要的函数，然后将其添加到python路径中进行导入。(如果您已经准备好了自己的建议性区域，或者不希望麻烦执行此步骤，detect.py必须接受CSV格式的图像列表和包围框。)

运行以下命令获取Caffe R-CNN ImageNet模型：

./scripts/download_model_binary.py models / bvlc_reference_rcnn_ilsvrc13

完成后，我们将调用附带的的detect.py来生成区域提议并运行网络。有关参数的解释，请执行./dectct.py --help。

In[1]:

!mkdir -p _temp
!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt
!../python/detect.py --crop_mode=selective_search --pretrained_model=../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=../models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0218 20:43:25.383932 2099749632 net.cpp:42] Initializing net from parameters: 
name: "R-CNN-ilsvrc13"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 227
input_dim: 227
state {
  phase: TEST
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc-rcnn"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc-rcnn"
  inner_product_param {
    num_output: 200
  }
}
I0218 20:43:25.385720 2099749632 net.cpp:336] Input 0 -> data
I0218 20:43:25.385769 2099749632 layer_factory.hpp:74] Creating layer conv1
I0218 20:43:25.385783 2099749632 net.cpp:76] Creating Layer conv1
I0218 20:43:25.385790 2099749632 net.cpp:372] conv1 <- data
I0218 20:43:25.385802 2099749632 net.cpp:334] conv1 -> conv1
I0218 20:43:25.385815 2099749632 net.cpp:105] Setting up conv1
I0218 20:43:25.386574 2099749632 net.cpp:112] Top shape: 10 96 55 55 (2904000)
I0218 20:43:25.386610 2099749632 layer_factory.hpp:74] Creating layer relu1
I0218 20:43:25.386625 2099749632 net.cpp:76] Creating Layer relu1
I0218 20:43:25.386631 2099749632 net.cpp:372] relu1 <- conv1
I0218 20:43:25.386641 2099749632 net.cpp:323] relu1 -> conv1 (in-place)
I0218 20:43:25.386649 2099749632 net.cpp:105] Setting up relu1
I0218 20:43:25.386656 2099749632 net.cpp:112] Top shape: 10 96 55 55 (2904000)
I0218 20:43:25.386663 2099749632 layer_factory.hpp:74] Creating layer pool1
I0218 20:43:25.386675 2099749632 net.cpp:76] Creating Layer pool1
I0218 20:43:25.386682 2099749632 net.cpp:372] pool1 <- conv1
I0218 20:43:25.386690 2099749632 net.cpp:334] pool1 -> pool1
I0218 20:43:25.386699 2099749632 net.cpp:105] Setting up pool1
I0218 20:43:25.386716 2099749632 net.cpp:112] Top shape: 10 96 27 27 (699840)
I0218 20:43:25.386725 2099749632 layer_factory.hpp:74] Creating layer norm1
I0218 20:43:25.386736 2099749632 net.cpp:76] Creating Layer norm1
I0218 20:43:25.386744 2099749632 net.cpp:372] norm1 <- pool1
I0218 20:43:25.386803 2099749632 net.cpp:334] norm1 -> norm1
I0218 20:43:25.386819 2099749632 net.cpp:105] Setting up norm1
I0218 20:43:25.386832 2099749632 net.cpp:112] Top shape: 10 96 27 27 (699840)
I0218 20:43:25.386842 2099749632 layer_factory.hpp:74] Creating layer conv2
I0218 20:43:25.386852 2099749632 net.cpp:76] Creating Layer conv2
I0218 20:43:25.386865 2099749632 net.cpp:372] conv2 <- norm1
I0218 20:43:25.386878 2099749632 net.cpp:334] conv2 -> conv2
I0218 20:43:25.386899 2099749632 net.cpp:105] Setting up conv2
I0218 20:43:25.387024 2099749632 net.cpp:112] Top shape: 10 256 27 27 (1866240)
I0218 20:43:25.387042 2099749632 layer_factory.hpp:74] Creating layer relu2
I0218 20:43:25.387050 2099749632 net.cpp:76] Creating Layer relu2
I0218 20:43:25.387058 2099749632 net.cpp:372] relu2 <- conv2
I0218 20:43:25.387066 2099749632 net.cpp:323] relu2 -> conv2 (in-place)
I0218 20:43:25.387075 2099749632 net.cpp:105] Setting up relu2
I0218 20:43:25.387081 2099749632 net.cpp:112] Top shape: 10 256 27 27 (1866240)
I0218 20:43:25.387089 2099749632 layer_factory.hpp:74] Creating layer pool2
I0218 20:43:25.387097 2099749632 net.cpp:76] Creating Layer pool2
I0218 20:43:25.387104 2099749632 net.cpp:372] pool2 <- conv2
I0218 20:43:25.387112 2099749632 net.cpp:334] pool2 -> pool2
I0218 20:43:25.387121 2099749632 net.cpp:105] Setting up pool2
I0218 20:43:25.387130 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.387137 2099749632 layer_factory.hpp:74] Creating layer norm2
I0218 20:43:25.387145 2099749632 net.cpp:76] Creating Layer norm2
I0218 20:43:25.387152 2099749632 net.cpp:372] norm2 <- pool2
I0218 20:43:25.387161 2099749632 net.cpp:334] norm2 -> norm2
I0218 20:43:25.387168 2099749632 net.cpp:105] Setting up norm2
I0218 20:43:25.387176 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.387228 2099749632 layer_factory.hpp:74] Creating layer conv3
I0218 20:43:25.387249 2099749632 net.cpp:76] Creating Layer conv3
I0218 20:43:25.387258 2099749632 net.cpp:372] conv3 <- norm2
I0218 20:43:25.387266 2099749632 net.cpp:334] conv3 -> conv3
I0218 20:43:25.387276 2099749632 net.cpp:105] Setting up conv3
I0218 20:43:25.389375 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.389408 2099749632 layer_factory.hpp:74] Creating layer relu3
I0218 20:43:25.389421 2099749632 net.cpp:76] Creating Layer relu3
I0218 20:43:25.389430 2099749632 net.cpp:372] relu3 <- conv3
I0218 20:43:25.389438 2099749632 net.cpp:323] relu3 -> conv3 (in-place)
I0218 20:43:25.389447 2099749632 net.cpp:105] Setting up relu3
I0218 20:43:25.389456 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.389462 2099749632 layer_factory.hpp:74] Creating layer conv4
I0218 20:43:25.389472 2099749632 net.cpp:76] Creating Layer conv4
I0218 20:43:25.389478 2099749632 net.cpp:372] conv4 <- conv3
I0218 20:43:25.389487 2099749632 net.cpp:334] conv4 -> conv4
I0218 20:43:25.389497 2099749632 net.cpp:105] Setting up conv4
I0218 20:43:25.391810 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.391856 2099749632 layer_factory.hpp:74] Creating layer relu4
I0218 20:43:25.391871 2099749632 net.cpp:76] Creating Layer relu4
I0218 20:43:25.391880 2099749632 net.cpp:372] relu4 <- conv4
I0218 20:43:25.391888 2099749632 net.cpp:323] relu4 -> conv4 (in-place)
I0218 20:43:25.391898 2099749632 net.cpp:105] Setting up relu4
I0218 20:43:25.391906 2099749632 net.cpp:112] Top shape: 10 384 13 13 (648960)
I0218 20:43:25.391913 2099749632 layer_factory.hpp:74] Creating layer conv5
I0218 20:43:25.391923 2099749632 net.cpp:76] Creating Layer conv5
I0218 20:43:25.391929 2099749632 net.cpp:372] conv5 <- conv4
I0218 20:43:25.391937 2099749632 net.cpp:334] conv5 -> conv5
I0218 20:43:25.391947 2099749632 net.cpp:105] Setting up conv5
I0218 20:43:25.393072 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.393108 2099749632 layer_factory.hpp:74] Creating layer relu5
I0218 20:43:25.393122 2099749632 net.cpp:76] Creating Layer relu5
I0218 20:43:25.393129 2099749632 net.cpp:372] relu5 <- conv5
I0218 20:43:25.393138 2099749632 net.cpp:323] relu5 -> conv5 (in-place)
I0218 20:43:25.393148 2099749632 net.cpp:105] Setting up relu5
I0218 20:43:25.393157 2099749632 net.cpp:112] Top shape: 10 256 13 13 (432640)
I0218 20:43:25.393167 2099749632 layer_factory.hpp:74] Creating layer pool5
I0218 20:43:25.393175 2099749632 net.cpp:76] Creating Layer pool5
I0218 20:43:25.393182 2099749632 net.cpp:372] pool5 <- conv5
I0218 20:43:25.393190 2099749632 net.cpp:334] pool5 -> pool5
I0218 20:43:25.393199 2099749632 net.cpp:105] Setting up pool5
I0218 20:43:25.393209 2099749632 net.cpp:112] Top shape: 10 256 6 6 (92160)
I0218 20:43:25.393218 2099749632 layer_factory.hpp:74] Creating layer fc6
I0218 20:43:25.393226 2099749632 net.cpp:76] Creating Layer fc6
I0218 20:43:25.393232 2099749632 net.cpp:372] fc6 <- pool5
I0218 20:43:25.393240 2099749632 net.cpp:334] fc6 -> fc6
I0218 20:43:25.393249 2099749632 net.cpp:105] Setting up fc6
I0218 20:43:25.516396 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516445 2099749632 layer_factory.hpp:74] Creating layer relu6
I0218 20:43:25.516463 2099749632 net.cpp:76] Creating Layer relu6
I0218 20:43:25.516470 2099749632 net.cpp:372] relu6 <- fc6
I0218 20:43:25.516480 2099749632 net.cpp:323] relu6 -> fc6 (in-place)
I0218 20:43:25.516490 2099749632 net.cpp:105] Setting up relu6
I0218 20:43:25.516497 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516505 2099749632 layer_factory.hpp:74] Creating layer drop6
I0218 20:43:25.516515 2099749632 net.cpp:76] Creating Layer drop6
I0218 20:43:25.516521 2099749632 net.cpp:372] drop6 <- fc6
I0218 20:43:25.516530 2099749632 net.cpp:323] drop6 -> fc6 (in-place)
I0218 20:43:25.516538 2099749632 net.cpp:105] Setting up drop6
I0218 20:43:25.516557 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.516566 2099749632 layer_factory.hpp:74] Creating layer fc7
I0218 20:43:25.516576 2099749632 net.cpp:76] Creating Layer fc7
I0218 20:43:25.516582 2099749632 net.cpp:372] fc7 <- fc6
I0218 20:43:25.516589 2099749632 net.cpp:334] fc7 -> fc7
I0218 20:43:25.516599 2099749632 net.cpp:105] Setting up fc7
I0218 20:43:25.604786 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604838 2099749632 layer_factory.hpp:74] Creating layer relu7
I0218 20:43:25.604852 2099749632 net.cpp:76] Creating Layer relu7
I0218 20:43:25.604859 2099749632 net.cpp:372] relu7 <- fc7
I0218 20:43:25.604868 2099749632 net.cpp:323] relu7 -> fc7 (in-place)
I0218 20:43:25.604878 2099749632 net.cpp:105] Setting up relu7
I0218 20:43:25.604885 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604893 2099749632 layer_factory.hpp:74] Creating layer drop7
I0218 20:43:25.604902 2099749632 net.cpp:76] Creating Layer drop7
I0218 20:43:25.604908 2099749632 net.cpp:372] drop7 <- fc7
I0218 20:43:25.604917 2099749632 net.cpp:323] drop7 -> fc7 (in-place)
I0218 20:43:25.604924 2099749632 net.cpp:105] Setting up drop7
I0218 20:43:25.604933 2099749632 net.cpp:112] Top shape: 10 4096 1 1 (40960)
I0218 20:43:25.604939 2099749632 layer_factory.hpp:74] Creating layer fc-rcnn
I0218 20:43:25.604948 2099749632 net.cpp:76] Creating Layer fc-rcnn
I0218 20:43:25.604954 2099749632 net.cpp:372] fc-rcnn <- fc7
I0218 20:43:25.604962 2099749632 net.cpp:334] fc-rcnn -> fc-rcnn
I0218 20:43:25.604971 2099749632 net.cpp:105] Setting up fc-rcnn
I0218 20:43:25.606878 2099749632 net.cpp:112] Top shape: 10 200 1 1 (2000)
I0218 20:43:25.606904 2099749632 net.cpp:165] fc-rcnn does not need backward computation.
I0218 20:43:25.606909 2099749632 net.cpp:165] drop7 does not need backward computation.
I0218 20:43:25.606916 2099749632 net.cpp:165] relu7 does not need backward computation.
I0218 20:43:25.606922 2099749632 net.cpp:165] fc7 does not need backward computation.
I0218 20:43:25.606928 2099749632 net.cpp:165] drop6 does not need backward computation.
I0218 20:43:25.606935 2099749632 net.cpp:165] relu6 does not need backward computation.
I0218 20:43:25.606940 2099749632 net.cpp:165] fc6 does not need backward computation.
I0218 20:43:25.606946 2099749632 net.cpp:165] pool5 does not need backward computation.
I0218 20:43:25.606952 2099749632 net.cpp:165] relu5 does not need backward computation.
I0218 20:43:25.606958 2099749632 net.cpp:165] conv5 does not need backward computation.
I0218 20:43:25.606964 2099749632 net.cpp:165] relu4 does not need backward computation.
I0218 20:43:25.606971 2099749632 net.cpp:165] conv4 does not need backward computation.
I0218 20:43:25.606976 2099749632 net.cpp:165] relu3 does not need backward computation.
I0218 20:43:25.606982 2099749632 net.cpp:165] conv3 does not need backward computation.
I0218 20:43:25.606988 2099749632 net.cpp:165] norm2 does not need backward computation.
I0218 20:43:25.606995 2099749632 net.cpp:165] pool2 does not need backward computation.
I0218 20:43:25.607002 2099749632 net.cpp:165] relu2 does not need backward computation.
I0218 20:43:25.607007 2099749632 net.cpp:165] conv2 does not need backward computation.
I0218 20:43:25.607013 2099749632 net.cpp:165] norm1 does not need backward computation.
I0218 20:43:25.607199 2099749632 net.cpp:165] pool1 does not need backward computation.
I0218 20:43:25.607213 2099749632 net.cpp:165] relu1 does not need backward computation.
I0218 20:43:25.607219 2099749632 net.cpp:165] conv1 does not need backward computation.
I0218 20:43:25.607225 2099749632 net.cpp:201] This network produces output fc-rcnn
I0218 20:43:25.607239 2099749632 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0218 20:43:25.607255 2099749632 net.cpp:213] Network initialization done.
I0218 20:43:25.607262 2099749632 net.cpp:214] Memory required for data: 62425920
E0218 20:43:26.388214 2099749632 upgrade_proto.cpp:618] Attempting to upgrade input file specified using deprecated V1LayerParameter: ../models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel
I0218 20:43:27.089423 2099749632 upgrade_proto.cpp:626] Successfully upgraded file specified using deprecated V1LayerParameter
GPU mode
Loading input...
selective_search_rcnn({'/Users/shelhamer/h/desk/caffe/caffe-dev/examples/images/fish-bike.jpg'}, '/var/folders/bk/dtkn5qjd11bd17b2j36zplyw0000gp/T/tmpakaRLL.mat')
Processed 1570 windows in 102.895 s.
/Users/shelhamer/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py:2453: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block1_values] [items->['prediction']]

  warnings.warn(ws, PerformanceWarning)
Saved to _temp/det_output.h5 in 0.298 s.

这次运行是在GPU模式。对于CPU模式检测，在 without the --gpu 参数的情况下调用detection .py。

运行此操作会将带有文件名，选定窗口及其检测分数的DataFrame输出到HDF5文件。（我们只在一张图片上运行，因此文件名都是一样的。

In[2]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_hdf('_temp/det_output.h5', 'df')
print(df.shape)
print(df.iloc[0])

(1570, 5)
prediction    [-2.62247, -2.84579, -2.85122, -3.20838, -1.94...
ymin                                                     79.846
xmin                                                       9.62
ymax                                                     246.31
xmax                                                    339.624
Name: /Users/shelhamer/h/desk/caffe/caffe-dev/examples/images/fish-bike.jpg, dtype: object

j结果显示，模型提出了1570个区域的R-CNN配置的选择性搜索。提案的数量会根据图片的内容和大小而有所不同——选择性搜索并不是规模不变的。

一般来说，detect.py在运行大量图像时效率最高:它首先为所有图像提取窗口建议，然后批量处理窗口，以便高效GPU处理，然后输出结果。只需在图像文件中每行列出一个图像，它就会处理所有这些图像。

虽然本指南提供了一个R-CNN ImageNet检测示例，但是detect.py足够聪明，能够适应不同的Caffe模型的输入维度、批大小和输出类别。您可以根据需要切换模型定义和预训练模型。参考python detect.py --help描述数据集。不需要硬编码。

Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by

data/ilsvrc12/get_ilsvrc12_aux.sh

无论如何，现在让我们加载ILSVRC13检测类名称并制作预测的DataFrame。请注意，您需要data / ilsvrc12 / get_ilsvrc12_aux.sh提取的辅助ilsvrc2012数据。

In[3]:

with open('../data/ilsvrc12/det_synset_words.txt') as f:
    labels_df = pd.DataFrame([
        {
            'synset_id': l.strip().split(' ')[0],
            'name': ' '.join(l.strip().split(' ')[1:]).split(',')[0]
        }
        for l in f.readlines()
    ])
labels_df.sort('synset_id')
predictions_df = pd.DataFrame(np.vstack(df.prediction.values), columns=labels_df['name'])
print(predictions_df.iloc[0])

name
accordion      -2.622471
airplane       -2.845788
ant            -2.851219
antelope       -3.208377
apple          -1.949950
armadillo      -2.472935
artichoke      -2.201684
axe            -2.327404
baby bed       -2.737925
backpack       -2.176763
bagel          -2.681061
balance beam   -2.722538
banana         -2.390628
band aid       -1.598909
banjo          -2.298197
...
trombone        -2.582361
trumpet         -2.352853
turtle          -2.360859
tv or monitor   -2.761043
unicycle        -2.218467
vacuum          -1.907717
violin          -2.757079
volleyball      -2.723689
waffle iron     -2.418540
washer          -2.408994
water bottle    -2.174899
watercraft      -2.837425
whale           -3.120338
wine bottle     -2.772960
zebra           -2.742913
Name: 0, Length: 200, dtype: float32

让我们来看看激活情况。

In[4]:

plt.gray()
plt.matshow(predictions_df.values)
plt.xlabel('Classes')
plt.ylabel('Windows')

out[4]:

<matplotlib.text.Text at 0x114f15f90>
<matplotlib.figure.Figure at 0x114254b50>

现在让我们以最大交叉为例，遍历所有窗口并绘制顶级类。

In[5]:

max_s = predictions_df.max(0)
max_s.sort(ascending=False)
print(max_s[:10])

name
person          1.835771
bicycle         0.866110
unicycle        0.057080
motorcycle     -0.006122
banjo          -0.028209
turtle         -0.189831
electric fan   -0.206788
cart           -0.214235
lizard         -0.393519
helmet         -0.477942
dtype: float32

最高的探测器实际上是人和自行车。选择好的本地化是一项正在进行的工作;我们选择得分最高的人和自行车检测。

In[6]:

# Find, print, and display the top detections: person and bicycle.
i = predictions_df['person'].argmax()
j = predictions_df['bicycle'].argmax()

# Show top predictions for top detection.
f = pd.Series(df['prediction'].iloc[i], index=labels_df['name'])
print('Top detection:')
print(f.order(ascending=False)[:5])
print('')

# Show top predictions for second-best detection.
f = pd.Series(df['prediction'].iloc[j], index=labels_df['name'])
print('Second-best detection:')
print(f.order(ascending=False)[:5])

# Show top detection in red, second-best top detection in blue.
im = plt.imread('images/fish-bike.jpg')
plt.imshow(im)
currentAxis = plt.gca()

det = df.iloc[i]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='r', linewidth=5))

det = df.iloc[j]
coords = (det['xmin'], det['ymin']), det['xmax'] - det['xmin'], det['ymax'] - det['ymin']
currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor='b', linewidth=5))

Top detection:
name
person             1.835771
swimming trunks   -1.150371
rubber eraser     -1.231106
turtle            -1.266037
plastic bag       -1.303265
dtype: float32

Second-best detection:
name
bicycle     0.866110
unicycle   -0.359139
scorpion   -0.811621
lobster    -0.982891
lamp       -1.096808
dtype: float32

out[6]:

<matplotlib.patches.Rectangle at 0x118576a90>

这很酷。让我们采取所有'自行车'检测和NMS它们摆脱重叠的窗口。

In[7]:

def nms_detections(dets, overlap=0.3):
    """
    Non-maximum suppression: Greedily select high-scoring detections and
    skip detections that are significantly covered by a previously
    selected detection.

    This version is translated from Matlab code by Tomasz Malisiewicz,
    who sped up Pedro Felzenszwalb's code.

    Parameters
    ----------
    dets: ndarray
        each row is ['xmin', 'ymin', 'xmax', 'ymax', 'score']
    overlap: float
        minimum overlap ratio (0.3 default)

    Output
    ------
    dets: ndarray
        remaining after suppression.
    """
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    ind = np.argsort(dets[:, 4])

    w = x2 - x1
    h = y2 - y1
    area = (w * h).astype(float)

    pick = []
    while len(ind) > 0:
        i = ind[-1]
        pick.append(i)
        ind = ind[:-1]

        xx1 = np.maximum(x1[i], x1[ind])
        yy1 = np.maximum(y1[i], y1[ind])
        xx2 = np.minimum(x2[i], x2[ind])
        yy2 = np.minimum(y2[i], y2[ind])

        w = np.maximum(0., xx2 - xx1)
        h = np.maximum(0., yy2 - yy1)

        wh = w * h
        o = wh / (area[i] + area[ind] - wh)

        ind = ind[np.nonzero(o <= overlap)[0]]

    return dets[pick, :]

In[8]:

scores = predictions_df['bicycle']
windows = df[['xmin', 'ymin', 'xmax', 'ymax']].values
dets = np.hstack((windows, scores[:, np.newaxis]))
nms_dets = nms_detections(dets)

在图像中显示前3个NMS' '检测“自行车”，并注意顶部得分框(红色)和其余框之间的差距。

In[9]:

plt.imshow(im)
currentAxis = plt.gca()
colors = ['r', 'b', 'y']
for c, det in zip(colors, nms_dets[:3]):
    currentAxis.add_patch(
        plt.Rectangle((det[0], det[1]), det[2]-det[0], det[3]-det[1],
        fill=False, edgecolor=c, linewidth=5)
    )
print 'scores:', nms_dets[:3, 4]

scores: [ 0.86610985 -0.70051557 -1.34796357]

这是一个简单的自行车实例，因为它在类的训练集中。然而，person结果是一个真实的检测，因为它不在类的集合中。您应该尝试检测对您自己的图像下一步!(删除要清理的临时目录，就完成了。)

ps：又一个无聊的草稿.....

R-CNN detection（RCNN检测）

继续阅读

使用opencv的dnn模块进行人脸检测

YOLOv8来啦 | 详细解读YOLOv8的改进模块！YOLOv5官方出品YOLOv8！1、YOLOv5回顾2、YOLOv8核心介绍参考文章

对YOLO-v1的理解及阅读笔记YOLO-v1 阅读笔记

yolox运行报错--can‘t find starting numberyolox运行报错–can’t find starting number

【论文阅读笔记】Deep Neural Networks for Object Detection

【论文阅读笔记】CenterNet：Objects as Points

【论文阅读笔记】ThunderNet: Towards Real-time Generic Object Detection

【ICLR2019】Oral 论文汇总

【ICLR2019】Poster 论文汇总

目标检测系列（IV）：YOLO V1、YOLO V2、YOLO V3

pp-picodet从环境配置到部署全流程（5）——PaddleLite端侧部署1. PaddleDetection支持的部署形式说明

目标检测框架｜又一新框架来袭，关系网络用于目标检测（文末附源码）

yolov7 tensorrt模型加速部署【实战】

目标检测：YOLOV3论文解读一、yolov3论文解读

Pytorch机器学习（九）—— YOLO中对于锚框，预测框，产生候选区域及对候选区域进行标注详解 Pytorch机器学习（九）—— YOLO中锚框，预测框，产生候选区域及对候选区域进行标注详解前言一、基本概念二、代码讲解总结

2021-09-30三维点云测量正方形包裹体积