天天看点

mask-rcnn pytorch实现

mask-rcnn pytorch实现

自用,记录maskrcnn pytorch代码

1、模块

batch_norm

  • class FrozenBatchNorm2d():
  • function:批量正则化
  • torch.half():将tensor转换为其半精度tensor
               
  • tensor.rsqrt():开方
               

misc

  • helper class that supports empty tensors on some functions

backbone

  • resnet
  • resnet + fpn
  • retina + fpn

fpn

function:从conv2开始,literal 连接,创建fpn网络

  • getattr(object, name[, default]):返回object中变量名为name的属性值*
               

-采用插值进行上采样

  • F.interpolate(last_inner,scale_factor=2,model="nearest")
     scale_factor指定输出为输入的多少倍(width和height均变为n倍)
               
  • nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
               

retinanet

function:在fpn过后的各个层上添加retina net,输出bbox regression和classification results;

  • cls_tower[]:分类分支的conv参数
cls_tower.append(
                nn.Conv2d(
                    in_channels,
                    in_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1
                )
            )
           
  • bbox_tower[]:bbox 回归分支的conv参数
bbox_tower.append(
                nn.Conv2d(
                    in_channels,
                    in_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1
                )
            )
           
  • cls_logits():分类层的conv参数
`self.cls_logits = nn.Conv2d(
            in_channels, num_anchors * num_classes, kernel_size=3, stride=1,
            padding=1
        )``
           
  • bbox_pred():bbox回归层的conv参数
self.bbox_pred = nn.Conv2d(
            in_channels,  num_anchors * 4, kernel_size=3, stride=1,
            padding=1
        )
           
  • retina net参数初始化

    1、除了cls分支的最后一层,其余所有层的weight初始化为标准差为0.01的正态分布,bias初始化为0;

    2、cls分支的最后一层,将bias初始化为-math.log((1 - prior_prob) / prior_prob),一般prior_prob设置为0.01;

    3、边框回归的权重为:

  • box_coder = BoxCoder(weights=(10., 10., 5., 5.))
               

box_coder

function:

  • proposal [x, y, width, height]

    ------->ex_proposal

  • reference_box [x, y, width, height]

    ------->gt_box

  • weights: 用于bbox回归的四个权重参数
  • bbox 回归
wx, wy, ww, wh = self.weights
        targets_dx = wx * (gt_ctr_x - ex_ctr_x) / ex_widths
        targets_dy = wy * (gt_ctr_y - ex_ctr_y) / ex_heights
        targets_dw = ww * torch.log(gt_widths / ex_widths)
        targets_dh = wh * torch.log(gt_heights / ex_heights)
           
  • 在指定维度拼接tensor:
    -*torch.cat(tensors,dim=0,out=None)→ Tensor*
    对tensor沿指定维度进行拼接,但返回的tensor维度不变
               
<<<import torch
<<< a = torch.rand((2, 3))
<<<b = torch.rand((2, 3))
<<<c = torch.cat((a, b))
<<<a.size(), b.size(), c.size()
<<<(torch.Size([2, 3]), torch.Size([2, 3]), torch.Size([4, 3]))
           
-*torch.stack(tensors,dim=0,out=None)→ Tensor*
    对tensor沿指定维度拼接,但返回的tensor会增加一维
           
>>> import torch
>>> a = torch.rand((2, 3))
>>> b = torch.rand((2, 3))
>>> c = torch.stack((a, b))
>>> a.size(), b.size(), c.size()
(torch.Size([2, 3]), torch.Size([2, 3]), torch.Size([2, 2, 3]))
           
- torch.clamp(input, min, max, out=None) → Tensor
- 将输入的tensor的每个element固定在min与max之间,小于min的改为min,大于max的变为max,out指输出张量(optional)
           
  • encode:指根据gt box预测proposal的目标位置:targets[dx,dy,dw,sh]
  • decode:指根据回归得到的偏移量,计算gt box经过此偏移量后得到的pred_boxes[x1,y1,x2,y2]

Matcher()

function:为box分配标签;

分为三类:

大于high_threshold:标记为分类种类;

小于low_threshold:标记分类为-2;

中间值:标记分类为-1.

  • torch.max(input, dim, keepdim=False,out=None)
      1、指定dim时,返回值为沿dim的最大值及该值的index;
      2、未指定dim时,返回值为沿dim的最大值。
               
  • torch.range(begin,end,stride):包含end
      torch.arange(begin,end,stride):不包含end
               
  • torch.squeeze(a,N):对tensor的维度进行压缩,去掉维数为1的维度(也可以指定去掉第N维(该维的维数为1)),N(optional);
      torch.unsqueeze(a,N):对tensor的维度进行扩充,给指定位置加上维数为1的维度 ;
               
a = torch.randn(1,3)
b = torch.unsqueeze(a,1)  
c = a.unsqueeze(0)
d = squeeze(c)
f = torch.randn(3)
g = f.unsqueeze(0)
#a.size=([1,3]),b.size=([1,1,3]),c.size=([1,1,3]),d.size=([3]),f.size=([3]),g.size=([1,3])

           

sigmoid_focal_loss()

def sigmoid_focal_loss_cpu(logits, targets, gamma, alpha):
    num_classes = logits.shape[1]
    dtype = targets.dtype
    device = targets.device
    class_range = torch.arange(1, num_classes+1, dtype=dtype, device=device).unsqueeze(0)    #class_range.size=([1,num_classes])
        #class_range:分类类别(不包括bg)
    t = targets.unsqueeze(1)   #t.size=([n,1,1])
    p = torch.sigmoid(logits)
    term1 = (1 - p) ** gamma * torch.log(p)    #目标类
    term2 = p ** gamma * torch.log(1 - p)      #非目标类
    return -(t == class_range).float() * term1 * alpha - ((t != class_range) * (t >= 0)).float() * term2 * (1 - alpha)   #分类为非目标类时,用1-alpha
           

smooth_l1_loss

  • torch.where(condition, x, y)   ->  Tensor
               
    根据condition改变x中的元素,满足condition则保持元素,反之用y的对应数值替代x中不满足condition的element
>>> x = torch.randn(3, 2)
>>> y = torch.ones(3, 2)
>>> x
tensor([[-0.4620,  0.3139],
        [ 0.3898, -0.7197],
        [ 0.0478, -0.1657]])
>>> torch.where(x > 0, x, y)
tensor([[ 1.0000,  0.3139],
        [ 0.3898,  1.0000],
        [ 0.0478,  1.0000]])
>>> x = torch.randn(2, 2, dtype=torch.double)
>>> x
tensor([[ 1.0779,  0.0383],
        [-0.8785, -1.1089]], dtype=torch.float64)
>>> torch.where(x > 0, x, 0.)
tensor([[1.0779, 0.0383],
        [0.0000, 0.0000]], dtype=torch.float64)
           

AnchorGenerator()

  • AnchorGenerator(sizes=(128,256,512),aspect_ratios=(0.5,1.0,2.0),anchor_strides=(8,16,32),straddle_thresh=0)
def _generate_anchors(base_size, scales, aspect_ratios):
    """Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, base_size - 1, base_size - 1) window.
    """
    anchor = np.array([1, 1, base_size, base_size], dtype=np.float) - 1
    anchors = _ratio_enum(anchor, aspect_ratios)
    anchors = np.vstack(
        [_scale_enum(anchors[i, :], scales) for i in range(anchors.shape[0])]
    )
    return torch.from_numpy(anchors)
           
  • _ratio_enum():对某一个anchor产生3个不同高宽比的anchors
def _ratio_enum(anchor, ratios):
    """Enumerate a set of anchors for each aspect ratio wrt an anchor."""
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    size = w * h
    size_ratios = size / ratios
    ws = np.round(np.sqrt(size_ratios))   #aspect_ratio:指高宽比
    hs = np.round(ws * ratios)
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors
           

-_scale_enum():对不同高宽比的anchor产生三个不同尺寸的anchors

def _scale_enum(anchor, scales):
    """Enumerate a set of anchors for each scale wrt an anchor."""
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    ws = w * scales
    hs = h * scales
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors
           

RPN()

  • 先对feature map进行3*3卷积(输出通道数不变),然后再对输出的map进行分支处理(cls+bbox)
class RPNHead(nn.Module):
    """
    Adds a simple RPN Head with classification and regression heads
    """

    def __init__(self, cfg, in_channels, num_anchors):
        """
        Arguments:
            cfg              : config
            in_channels (int): number of channels of the input feature
            num_anchors (int): number of anchors to be predicted
        """
        super(RPNHead, self).__init__()
        self.conv = nn.Conv2d(
            in_channels, in_channels, kernel_size=3, stride=1, padding=1
        )
        self.cls_logits = nn.Conv2d(in_channels, num_anchors, kernel_size=1, stride=1)
        self.bbox_pred = nn.Conv2d(
            in_channels, num_anchors * 4, kernel_size=1, stride=1
        )

        for l in [self.conv, self.cls_logits, self.bbox_pred]:  #权重参数的初始化
            torch.nn.init.normal_(l.weight, std=0.01)
            torch.nn.init.constant_(l.bias, 0)

    def forward(self, x):
        logits = []    #对每一个anchor都有一个logits
        bbox_reg = []
        for feature in x:
            t = F.relu(self.conv(feature))  #卷积+relu+分类/bbox reg
            logits.append(self.cls_logits(t))
            bbox_reg.append(self.bbox_pred(t))
        return logits, bbox_reg
           

balanced_positive_negative_sampler():

function:返回每张图片中选出来的正例和负例,注意:每张图片中,会返回两个tensor,一个是pos_idx_per_image_mask,一个是neg_idx_per_image_mask,比如对于pos来说,box为pos则对应index的值为1,否则为0;对于neg同理。

  • torch.nonzero(input, *, out=None,as_tuple=False)
               
    返回的tensor:input中非零元素个数n*该元素的index
  • torch.randperm(n, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False)
               
    返回一个从0到n-1随机排列的数组
  • torch.split(tensor,split_size_or_sections,dim=0)
    该方法对tensor进行切块,若split_size_or_sections为整数,则将tensor切分为每块大小为split_size_or_sections的块;若此参数为列表,则将tensor切成和列表中元素大小一样的块
               

继续阅读