mask-rcnn pytorch實作

自用，記錄maskrcnn pytorch代碼

1、子產品

batch_norm

class FrozenBatchNorm2d():
function:批量正則化

torch.half():将tensor轉換為其半精度tensor

```
tensor.rsqrt():開方
           
```

misc

helper class that supports empty tensors on some functions

backbone

resnet
resnet + fpn
retina + fpn

fpn

function:從conv2開始，literal 連接配接，建立fpn網絡

getattr(object, name[, default])：傳回object中變量名為name的屬性值*

-采用插值進行上采樣

F.interpolate(last_inner,scale_factor=2,model="nearest")
 scale_factor指定輸出為輸入的多少倍（width和height均變為n倍）

nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

retinanet

function:在fpn過後的各個層上添加retina net，輸出bbox regression和classification results；

cls_tower[]:分類分支的conv參數

cls_tower.append(
                nn.Conv2d(
                    in_channels,
                    in_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1
                )
            )

bbox_tower[]:bbox 回歸分支的conv參數

bbox_tower.append(
                nn.Conv2d(
                    in_channels,
                    in_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1
                )
            )

cls_logits():分類層的conv參數

`self.cls_logits = nn.Conv2d(
            in_channels, num_anchors * num_classes, kernel_size=3, stride=1,
            padding=1
        )``

bbox_pred():bbox回歸層的conv參數

self.bbox_pred = nn.Conv2d(
            in_channels,  num_anchors * 4, kernel_size=3, stride=1,
            padding=1
        )

retina net參數初始化

1、除了cls分支的最後一層，其餘所有層的weight初始化為标準差為0.01的正态分布，bias初始化為0；

2、cls分支的最後一層，将bias初始化為-math.log((1 - prior_prob) / prior_prob),一般prior_prob設定為0.01；

3、邊框回歸的權重為：

box_coder = BoxCoder(weights=(10., 10., 5., 5.))

box_coder

function:

proposal [x, y, width, height]

------->ex_proposal
reference_box [x, y, width, height]

------->gt_box
weights: 用于bbox回歸的四個權重參數
bbox 回歸

wx, wy, ww, wh = self.weights
        targets_dx = wx * (gt_ctr_x - ex_ctr_x) / ex_widths
        targets_dy = wy * (gt_ctr_y - ex_ctr_y) / ex_heights
        targets_dw = ww * torch.log(gt_widths / ex_widths)
        targets_dh = wh * torch.log(gt_heights / ex_heights)

在指定次元拼接tensor：

-*torch.cat(tensors,dim=0,out=None)→ Tensor*
對tensor沿指定次元進行拼接，但傳回的tensor次元不變

<<<import torch
<<< a = torch.rand((2, 3))
<<<b = torch.rand((2, 3))
<<<c = torch.cat((a, b))
<<<a.size(), b.size(), c.size()
<<<(torch.Size([2, 3]), torch.Size([2, 3]), torch.Size([4, 3]))

-*torch.stack(tensors,dim=0,out=None)→ Tensor*
    對tensor沿指定次元拼接，但傳回的tensor會增加一維

>>> import torch
>>> a = torch.rand((2, 3))
>>> b = torch.rand((2, 3))
>>> c = torch.stack((a, b))
>>> a.size(), b.size(), c.size()
(torch.Size([2, 3]), torch.Size([2, 3]), torch.Size([2, 2, 3]))

- torch.clamp(input, min, max, out=None) → Tensor
- 将輸入的tensor的每個element固定在min與max之間，小于min的改為min，大于max的變為max，out指輸出張量（optional)

encode:指根據gt box預測proposal的目标位置：targets[dx,dy,dw,sh]
decode:指根據回歸得到的偏移量，計算gt box經過此偏移量後得到的pred_boxes[x1,y1,x2,y2]

Matcher()

function:為box配置設定标簽；

分為三類：

大于high_threshold：标記為分類種類；

小于low_threshold:标記分類為-2；

中間值：标記分類為-1.

torch.max(input, dim, keepdim=False,out=None)
  1、指定dim時，傳回值為沿dim的最大值及該值的index；
  2、未指定dim時，傳回值為沿dim的最大值。

torch.range(begin,end,stride):包含end
  torch.arange(begin,end,stride)：不包含end

torch.squeeze(a,N):對tensor的次元進行壓縮，去掉維數為1的次元（也可以指定去掉第N維（該維的維數為1）），N（optional）；
  torch.unsqueeze(a,N):對tensor的次元進行擴充，給指定位置加上維數為1的次元 ；

a = torch.randn(1,3)
b = torch.unsqueeze(a,1)  
c = a.unsqueeze(0)
d = squeeze(c)
f = torch.randn(3)
g = f.unsqueeze(0)
#a.size=([1,3]),b.size=([1,1,3]),c.size=([1,1,3]),d.size=([3]),f.size=([3]),g.size=([1,3])

sigmoid_focal_loss()

def sigmoid_focal_loss_cpu(logits, targets, gamma, alpha):
    num_classes = logits.shape[1]
    dtype = targets.dtype
    device = targets.device
    class_range = torch.arange(1, num_classes+1, dtype=dtype, device=device).unsqueeze(0)    #class_range.size=([1,num_classes])
        #class_range:分類類别（不包括bg）
    t = targets.unsqueeze(1)   #t.size=([n,1,1])
    p = torch.sigmoid(logits)
    term1 = (1 - p) ** gamma * torch.log(p)    #目标類
    term2 = p ** gamma * torch.log(1 - p)      #非目标類
    return -(t == class_range).float() * term1 * alpha - ((t != class_range) * (t >= 0)).float() * term2 * (1 - alpha)   #分類為非目标類時，用1-alpha

smooth_l1_loss

```
torch.where(condition, x, y)   ->  Tensor
           
```
根據condition改變x中的元素，滿足condition則保持元素，反之用y的對應數值替代x中不滿足condition的element

>>> x = torch.randn(3, 2)
>>> y = torch.ones(3, 2)
>>> x
tensor([[-0.4620,  0.3139],
        [ 0.3898, -0.7197],
        [ 0.0478, -0.1657]])
>>> torch.where(x > 0, x, y)
tensor([[ 1.0000,  0.3139],
        [ 0.3898,  1.0000],
        [ 0.0478,  1.0000]])
>>> x = torch.randn(2, 2, dtype=torch.double)
>>> x
tensor([[ 1.0779,  0.0383],
        [-0.8785, -1.1089]], dtype=torch.float64)
>>> torch.where(x > 0, x, 0.)
tensor([[1.0779, 0.0383],
        [0.0000, 0.0000]], dtype=torch.float64)

AnchorGenerator()

AnchorGenerator(sizes=(128,256,512),aspect_ratios=(0.5,1.0,2.0),anchor_strides=(8,16,32),straddle_thresh=0)

def _generate_anchors(base_size, scales, aspect_ratios):
    """Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, base_size - 1, base_size - 1) window.
    """
    anchor = np.array([1, 1, base_size, base_size], dtype=np.float) - 1
    anchors = _ratio_enum(anchor, aspect_ratios)
    anchors = np.vstack(
        [_scale_enum(anchors[i, :], scales) for i in range(anchors.shape[0])]
    )
    return torch.from_numpy(anchors)

_ratio_enum():對某一個anchor産生3個不同高寬比的anchors

def _ratio_enum(anchor, ratios):
    """Enumerate a set of anchors for each aspect ratio wrt an anchor."""
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    size = w * h
    size_ratios = size / ratios
    ws = np.round(np.sqrt(size_ratios))   #aspect_ratio:指高寬比
    hs = np.round(ws * ratios)
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

-_scale_enum():對不同高寬比的anchor産生三個不同尺寸的anchors

def _scale_enum(anchor, scales):
    """Enumerate a set of anchors for each scale wrt an anchor."""
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    ws = w * scales
    hs = h * scales
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

RPN()

先對feature map進行3*3卷積（輸出通道數不變），然後再對輸出的map進行分支處理（cls+bbox）

class RPNHead(nn.Module):
    """
    Adds a simple RPN Head with classification and regression heads
    """

    def __init__(self, cfg, in_channels, num_anchors):
        """
        Arguments:
            cfg              : config
            in_channels (int): number of channels of the input feature
            num_anchors (int): number of anchors to be predicted
        """
        super(RPNHead, self).__init__()
        self.conv = nn.Conv2d(
            in_channels, in_channels, kernel_size=3, stride=1, padding=1
        )
        self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
        self.bbox_pred = nn.Conv2d(
            in_channels, num_anchors * 4, kernel_size=1, stride=1
        )

        for l in [self.conv, self.cls_logits, self.bbox_pred]:  #權重參數的初始化
            torch.nn.init.normal_(l.weight, std=0.01)
            torch.nn.init.constant_(l.bias, 0)

    def forward(self, x):
        logits = []    #對每一個anchor都有一個logits
        bbox_reg = []
        for feature in x:
            t = F.relu(self.conv(feature))  #卷積+relu+分類/bbox reg
            logits.append(self.cls_logits(t))
            bbox_reg.append(self.bbox_pred(t))
        return logits, bbox_reg

balanced_positive_negative_sampler():

function:傳回每張圖檔中選出來的正例和負例，注意：每張圖檔中，會傳回兩個tensor，一個是pos_idx_per_image_mask，一個是neg_idx_per_image_mask，比如對于pos來說，box為pos則對應index的值為1，否則為0；對于neg同理。

torch.nonzero(input, *, out=None,as_tuple=False)

傳回的tensor：input中非零元素個數n*該元素的index

torch.randperm(n, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False)

傳回一個從0到n-1随機排列的數組

torch.split(tensor,split_size_or_sections,dim=0)
該方法對tensor進行切塊，若split_size_or_sections為整數，則将tensor切分為每塊大小為split_size_or_sections的塊；若此參數為清單，則将tensor切成和清單中元素大小一樣的塊

mask-rcnn pytorch實作

1、子產品

batch_norm

misc

backbone

fpn

retinanet

box_coder

Matcher()

sigmoid_focal_loss()

smooth_l1_loss

AnchorGenerator()

RPN()

balanced_positive_negative_sampler():

繼續閱讀

PyTorch自動混合精度訓練(AMP)手冊PyTorch自動混合精度訓練(AMP)手冊

PyTorch的自動混合精度（AMP）

Pytorch自動混合精度(AMP)介紹與使用Pytorch自動混合精度(AMP)介紹與使用

關于半精度fp16的混合訓練fp16fp16&fp32混合精度訓練

pytorch 基于 apex.amp 的混合精度訓練：原理介紹與實作

9、TORCH.UTILS.MODEL_ZOO

梯度累加及torch實作1. 什麼是梯度累加2. 梯度累加的過程3. 實驗4. 參考

torch.nn.Upsample實作上采樣

深度學習的一些小記錄裡面有一部分是摘錄

LabelImg的安裝與使用（Anaconda環境）Labellmg的安裝

pytorch：List中包含Tensor的grad資料怎麼辦？

Pytorch機器學習（九）—— YOLO中對于錨框，預測框，産生候選區域及對候選區域進行标注詳解 Pytorch機器學習（九）—— YOLO中錨框，預測框，産生候選區域及對候選區域進行标注詳解前言一、基本概念二、代碼講解總結

CogView: Mastering Text-to-Image Generation via Transformers翻譯摘要1.介紹2.方法3.Finetuning

【深度學習】損失函數記錄0. 前言1. 正文參考文獻

深度學習之卷積01 卷積02 填充Padding03 步幅Stride04 卷積核的選擇05 多通道卷積參考

【Torch】最簡潔logging使用指南