keras版本SSD

源代碼位址：https://github.com/pierluigiferrari/ssd_keras

1.資料輸入存儲

object_detection_2d_data_generator.py

修改資料存儲格式整形改成浮點型（但意味着存儲空間擴大2倍）：

hdf5_labels = hdf5_dataset.create_dataset(name='labels',
                                                      shape=(dataset_size,),
                                                      maxshape=(None),
                                                      dtype=h5py.special_dtype(vlen=np.float))

添加資料字段，如添加angle.

在parse_xml（）中修改item_dict以及相關輸出格式，添加相關字段：

self.labels_output_format = labels_output_format
                self.labels_format={'class_id': labels_output_format.index('class_id'),
                            'xmin': labels_output_format.index('xmin'),
                            'ymin': labels_output_format.index('ymin'),
                            'xmax': labels_output_format.index('xmax'),
                            'ymax': labels_output_format.index('ymax'),
                            'x1': labels_output_format.index('x1'),
                            'y1': labels_output_format.index('y1'),
                            'x2': labels_output_format.index('x2'),
                            'y2': labels_output_format.index('y2'),
                            'h': labels_output_format.index('h')
                            }

2.資料編碼

ssd_input_encoder.py

添加新增字段索引：

class_id = 0
xmin = 1
ymin = 2
xmax = 3
ymax = 4
x1 = 5
y1 = 6
x2 = 7
y2 = 8
h = 9

每個batchsize 的所有資料都存在：

y_encoded = self.generate_encoding_template(batch_size=batch_size, diagnostics=False)

上述語句即先初始化y_encoded模闆，定義好資料規模，因為新增了字段是以需要修改 self.generate_encoding_template(batch_size=batch_size, diagnostics=False)函數。需要修改以及新增代碼如下：

rotatetensor = np.zeros((batch_size, boxes_tensor.shape[1], 5))

cx = boxes_tensor[..., 0]
cy = boxes_tensor[..., 1]
w = boxes_tensor[..., 2]
h = boxes_tensor[..., 3]

rotatetensor[..., 0] = cx-w/2
rotatetensor[..., 1] = cy-h/2
rotatetensor[..., 2] = cx+w/2
rotatetensor[..., 3] = cy-h/2
rotatetensor[..., 4] = h



y_encoding_template = np.concatenate((classes_tensor, boxes_tensor,rotatetensor, boxes_tensor, variances_tensor), axis=2)

資料歸一化代碼：

if self.normalize_coords:
    labels[:,[ymin,ymax]] /= self.img_height # Normalize ymin and ymax relative to the image height
    labels[:,[xmin,xmax]] /= self.img_width

預設框與真實框比對時，将真實資料賦予相應位置，帶有索引的注意修改相應索引值:

y_encoded[i, bipartite_matches, :-13] = labels_one_hot
y_encoded[i, bipartite_matches, -13:-8] = labels[:, [x1,y1,x2,y2,h]]

多比對政策：

y_encoded[i, matches[1], :-13] = labels_one_hot[matches[0]]

for k in range(len(matches[1])):
y_encoded[i, matches[1][k], -13:-8] = labels[matches[0][k],-5:]

得到偏移量（也是相應要預測的值），注意此時的cx,cy已經不是原圖像标注對應的值了（經過資料增強，随機裁剪）：

if self.coords == 'centroids':

            y_encoded[:,:,[-17,-16]] -= y_encoded[:,:,[-8,-7]] # cx(gt) - cx(anchor), cy(gt) - cy(anchor)
            # print('0000002', y_encoded[:, matches[1], -17:-15])
            y_encoded[:,:,[-17,-16]] /= y_encoded[:,:,[-6,-5]] * y_encoded[:,:,[-4,-3]] # (cx(gt) - cx(anchor)) / w(anchor) / cx_variance, (cy(gt) - cy(anchor)) / h(anchor) / cy_variance
            y_encoded[:,:,[-15,-14]] /= y_encoded[:,:,[-6,-5]] # w(gt) / w(anchor), h(gt) / h(anchor)
            y_encoded[:,:,[-15,-14]] = np.log(y_encoded[:,:,[-15,-14]]) / y_encoded[:,:,[-2,-1]] # ln(w(gt) / w(anchor)) / w_variance, ln(h(gt) / h(anchor)) / h_variance (ln == natural logarithm)

            # print(anchorcx[:, matches[1]])
            # 相對 default anchor
            anchorcx=y_encoded[:,:, -8]
            anchorcy=y_encoded[:,:, -7]
            anchorw=y_encoded[:,:, -6]
            anchorh=y_encoded[:,:, -5]

            anchorx1=(anchorcx-anchorw/2)
            anchory1=(anchorcy-anchorh/2)
            y_encoded[:, :, -13] -= anchorx1  # x1 offset
            y_encoded[:, :, -12] -= anchory1  # y1 offset

            y_encoded[:, :, [-13,-12]] /= y_encoded[:,:, [-6,-5]]*y_encoded[:,:,[-4,-3]]

            y_encoded[:, :, -11] -= (anchorcx+anchorw/2)  # x2 offset
            y_encoded[:, :, -10] -= (anchorcy-anchorh/2)  # y2 offset

            y_encoded[:, :, [-11, -10]] /= y_encoded[:, :, [-6, -5]]*y_encoded[:,:,[-4,-3]]

            y_encoded[:, :, -9] /= y_encoded[:, :, -5]
            y_encoded[:, :, -9] = np.log(y_encoded[:, :, -9]) / y_encoded[:, :, -1]

resnet_keras_ssd300.py 增加了通道值，同時預測時也要增加通道預測值:

# We predict 4 box coordinates for each box, hence the localization predictors have depth `n_boxes * 4`
    # Output shape of the localization layers: `(batch, height, width, n_boxes * 4)`
    conv4_3_norm_mbox_loc = Conv2D(n_boxes[0] * 6, (3, 3), padding='same', kernel_initializer='he_normal',
                                   kernel_regularizer=l2(l2_reg), name='conv4_3_norm_mbox_loc')(conv4_3_norm)
    fc7_mbox_loc = Conv2D(n_boxes[1] * 6, (3, 3), padding='same', kernel_initializer='he_normal',
                          kernel_regularizer=l2(l2_reg), name='fc7_mbox_loc')(fc7)
    conv6_2_mbox_loc = Conv2D(n_boxes[2] * 6, (3, 3), padding='same', kernel_initializer='he_normal',
                              kernel_regularizer=l2(l2_reg), name='conv6_2_mbox_loc')(conv6_2)
    conv7_2_mbox_loc = Conv2D(n_boxes[3] * 6, (3, 3), padding='same', kernel_initializer='he_normal',
                              kernel_regularizer=l2(l2_reg), name='conv7_2_mbox_loc')(conv7_2)
    conv8_2_mbox_loc = Conv2D(n_boxes[4] * 6, (3, 3), padding='same', kernel_initializer='he_normal',
                              kernel_regularizer=l2(l2_reg), name='conv8_2_mbox_loc')(conv8_2)
    conv9_2_mbox_loc = Conv2D(n_boxes[5] * 6, (3, 3), padding='same', kernel_initializer='he_normal',
                              kernel_regularizer=l2(l2_reg), name='conv9_2_mbox_loc')(conv9_2)



 # Reshape the box predictions, yielding 3D tensors of shape `(batch, height * width * n_boxes, 4)`
    # We want the four box coordinates isolated in the last axis to compute the smooth L1 loss
    conv4_3_norm_mbox_loc_reshape = Reshape((-1, 6), name='conv4_3_norm_mbox_loc_reshape')(conv4_3_norm_mbox_loc)
    fc7_mbox_loc_reshape = Reshape((-1, 6), name='fc7_mbox_loc_reshape')(fc7_mbox_loc)
    conv6_2_mbox_loc_reshape = Reshape((-1, 6), name='conv6_2_mbox_loc_reshape')(conv6_2_mbox_loc)
    conv7_2_mbox_loc_reshape = Reshape((-1, 6), name='conv7_2_mbox_loc_reshape')(conv7_2_mbox_loc)
    conv8_2_mbox_loc_reshape = Reshape((-1, 6), name='conv8_2_mbox_loc_reshape')(conv8_2_mbox_loc)
    conv9_2_mbox_loc_reshape = Reshape((-1, 6), name='conv9_2_mbox_loc_reshape')(conv9_2_mbox_loc)

3.預測解碼

x1 = y_pred[..., -13] * y_pred[..., -4] * y_pred[..., -6] + myxmin
        y1 = y_pred[..., -12] * y_pred[..., -3] * y_pred[..., -5] + myymin
        x2 = y_pred[..., -11] * y_pred[..., -4] * y_pred[..., -6] + myxmax
        y2 = y_pred[..., -10]  * y_pred[..., -3] * y_pred[..., -5] + myymin
        h = tf.exp(y_pred[...,-9] * y_pred[...,-1]) * y_pred[...,-5]
        # If the model predicts box coordinates relative to the image dimensions and they are supposed
        # to be converted back to absolute coordinates, do that.
        def normalized_coords():
            xmin1 = tf.expand_dims(xmin * self.tf_img_width, axis=-1)
            ymin1 = tf.expand_dims(ymin * self.tf_img_height, axis=-1)
            xmax1 = tf.expand_dims(xmax * self.tf_img_width, axis=-1)
            ymax1 = tf.expand_dims(ymax * self.tf_img_height, axis=-1)

            mx1=tf.expand_dims(x1 * self.tf_img_width, axis=-1)
            my1 = tf.expand_dims(y1 * self.tf_img_height, axis=-1)
            mx2 = tf.expand_dims(x2 * self.tf_img_width, axis=-1)
            my2 = tf.expand_dims(y2 * self.tf_img_height, axis=-1)
            mh = tf.expand_dims(h * self.tf_img_height, axis=-1)
            return xmin1, ymin1, xmax1, ymax1,mx1,my1,mx2,my2,mh
        def non_normalized_coords():
            return tf.expand_dims(xmin, axis=-1), tf.expand_dims(ymin, axis=-1), tf.expand_dims(xmax, axis=-1), tf.expand_dims(ymax, axis=-1), \
                   tf.expand_dims(x1, axis=-1),tf.expand_dims(y1, axis=-1),tf.expand_dims(x2, axis=-1),tf.expand_dims(y2, axis=-1),tf.expand_dims(h, axis=-1)

        xmin, ymin, xmax, ymax,x1,y1,x2,y2,h= tf.cond(self.tf_normalize_coords, normalized_coords, non_normalized_coords)

注意修改n_classes,以及輸出次元：

n_classes = y_pred.shape[2] - 6
....
box_coordinates = batch_item[...,-6:]
.....
def no_confident_predictions():
    return tf.constant(value=0.0, shape=(1,8))
.....
filtered_predictions = tf.reshape(tensor=filtered_single_classes, shape=(-1,8))
.....
def compute_output_shape(self, input_shape):
        batch_size, n_boxes, last_axis = input_shape
        return (batch_size, self.tf_top_k, 8) # Last axis: (class_ID, confidence, 4 box coordinates)

def filter_single_class(index):

                # From a tensor of shape (n_boxes, n_classes + 4 coordinates) extract
                # a tensor of shape (n_boxes, 1 + 4 coordinates) that contains the
                # confidnece values for just one class, determined by `index`.
                confidences = tf.expand_dims(batch_item[..., index], axis=-1)
                class_id = tf.fill(dims=tf.shape(confidences), value=tf.to_float(index))
                box_coordinates = batch_item[...,-6:] #**************

                single_class = tf.concat([class_id, confidences, box_coordinates], axis=-1)

                # Apply confidence thresholding with respect to the class defined by `index`.
                threshold_met = single_class[:,1] > self.tf_confidence_thresh
                single_class = tf.boolean_mask(tensor=single_class,
                                               mask=threshold_met)

                # If any boxes made the threshold, perform NMS.
                def perform_nms():
                    scores = single_class[...,1]

                    # `tf.image.non_max_suppression()` needs the box coordinates in the format `(ymin, xmin, ymax, xmax)`.
                    xmin = tf.expand_dims(single_class[...,-6], axis=-1) #**************
                    ymin = tf.expand_dims(single_class[...,-5], axis=-1) #**************
                    xmax = tf.expand_dims(single_class[...,-4], axis=-1) #**************
                    ymax = tf.expand_dims(single_class[...,-3], axis=-1) #**************
                    boxes = tf.concat(values=[ymin, xmin, ymax, xmax], axis=-1)

                    maxima_indices = tf.image.non_max_suppression(boxes=boxes,
                                                                  scores=scores,
                                                                  max_output_size=self.tf_nms_max_output_size,
                                                                  iou_threshold=self.iou_threshold,
                                                                  name='non_maximum_suppresion')
                    maxima = tf.gather(params=single_class,
                                       indices=maxima_indices,
                                       axis=0)
                    return maxima

4.資料增強

data_augmentation_chain_original_ssd.py

将影響資料輸出的增強暫時去掉（後期有待優化）

self.sequence = [
                         # self.photometric_distortions,
                         # self.expand,
                         # self.random_crop,
                         # self.random_flip,
                         self.resize]

keras版本SSD

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

考證大全 | 證券從業資格考試

敲黑闆！2021年證券從業考試考點預測

2021年銀行從業考試考情介紹,果斷收藏!

證券從業合格證書什麼時候列印？有哪些注意事項？

【幹貨滿滿】初級銀行從業考試《個人理财》重點梳理

2020年經濟師考試，難嗎？

初級銀行從業資格證有什麼用？

MBA提前面試純幹貨分享

MBA值得學麼

吳恩達logistic回歸實作

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

深度學習模型分析人類複雜疾病的準确性

【趨高機器視覺】機器視覺技術原了解析及解決方案

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡