Follow and star
Never get lost
Institute of Computer Vision
Computer Vision Research Institute
Scan the QR code on the homepage to get how to join
Address: https://www.scitepress.org/Papers/2021/102344/102344.pdf
Special column of the Institute of Computer Vision
Column of Computer Vision Institute
Deep learning models have made major breakthroughs in the performance of object detection. However, in traditional models, such as Faster R-CNN and YOLO, the scale of these networks makes it difficult to deploy on embedded mobile devices due to limited computing resources and tight power budgets. I. Introduction
Deep learning models have made major breakthroughs in the performance of object detection. However, in traditional models, such as Faster R-CNN and YOLO, the scale of these networks makes it difficult to deploy on embedded mobile devices due to limited computing resources and tight power budgets.
The accelerated development of the field of deep learning has greatly promoted the development of object detection, which has a wide range of applications in face detection, autonomous driving, robot vision, and video surveillance. With the vigorous development of object detection, several deep convolutional neural network models have been proposed in recent years, such as R-CNN, SSD, and YOLO. However, as networks become more complex, the size of these models increases, making it increasingly difficult to deploy these models to embedded devices in real life. Therefore, it is important to develop an efficient and fast object detection model to reduce the parameter size without affecting the quality of object detection.
II. Background
随着目标检测网络系列不断变得更加复杂,减少权重参数和计算成本变得很重要。 模型压缩方法分为低秩分解、知识蒸馏、剪枝和量化,其中剪枝已被证明是通过去除冗余参数来降低网络复杂度的有效方法(A survey of model compression and acceleration for deep neural networks)。
为了解决目标检测网络问题,有几种最先进的工作技术可以减少YOLO架构中的参数数量。 (YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers) 开发了YOLO-Lite网络,其中从YOLOv2-tiny中删除了批量归一化层以加速目标检测。 该网络在PASCAL VOC 2007和COCO数据集上分别实现了33.81%和12.26%的mAP。 (Yolo nano: a highly compact you only look once convolutional neural network for object detection) 创建了一个高度紧凑的网络YOLO-nano,它是一个基于YOLO网络的8位量化模型,并在PASCAL VOC 2007数据集上进行了优化。 该网络在PASCAL VOC 2007数据集上实现了3.18M模型大小和69.1%mAP。
III. Overview
Therefore, the researchers proposed a new lightweight CNN-based object detection model, namely the YOLOv3-Tiny-based Micro-YOLO, which significantly reduces the number of parameters and computational cost while maintaining the detection performance. The researchers propose to replace the convolutional layer in the YOLOv3-tiny network with a depth-distributed-offset convolution (DSConv: https://arxiv.org/abs/1901.01928v1) and a moving inverse bottleneck convolution with squeeze and excitation blocks (MBConv: mainly derived from EfficientNet), and design a progressive channel-level pruning algorithm to minimize the number of parameters and maximize detection performance. Thus, the proposed Micro-YOLO network reduces the number of parameters by 3.46 times, the multiply-cumulative operation (MAC) by 2.55 times, and the mAP evaluated on the COCO dataset by a slight reduction of 0.7% compared to the original YOLOv3-tiny network.
Fourth, the introduction of the new framework
Micro-YOLO
In order to reduce the size of the network, the researchers explored an alternative lightweight convolutional layer to replace the convolutional layer Conv in the YOLO network. The MobileNet network uses two lightweight convolutional layers: (a) DSConv and (b) MBConv.
As shown in Figure (a) above, DSConv performs two types of convolution: (i) deep convolution and (ii) pointwise convolution, which can significantly reduce the model size and computational cost of the network. As shown in Figure (b) above, the structure of MBConv is a 1×1 channel expansion convolution, followed by a deep convolution and a 1×1 channel reduction layer. It makes use of the squeeze and excitation blocks, which are a branch consisting of a global average pooling operation in the squeeze phase and two small FC layers in the excitation phase to restore layers between deep convolution and channels. Since the number of output channels is not equal to the number of input channels, the researchers removed the residual connections in MBConv, and the MBConv layer provides compact representations at the input and output, while internally extending the input to a higher-dimensional feature space to increase the expressive power of the nonlinear transformation. Therefore, compared to the DSconv layer, the MBconv layer provides a better compression network without reducing the detection accuracy. The computational cost between these layers, i.e., the Conv layer (Cs), the DSConv layer (Cds), and the MBConv layer (Cmb), can be expressed by the following formulas, respectively:
where k represents the kernel size, Cin represents the number of input channels, Cout represents the number of output channels, W and H represent the width and height of the feature map, and α and β represent the expansion and reduction factors in MBConv, respectively.
Progressive Channel Pruning
After determining the architecture of the newly proposed Micro-YOLO network, the researchers can further reduce the weight parameters by using pruning techniques. In the proposed work, the researchers employed coarse-grained pruning because the DSConv and MBConv layers are mainly composed of 1×1 kernel sizes, which leaves minimal room for fine-grained pruning. (Rethinking the value of network pruning) suggests that the pruned architecture itself, rather than a set of inherited "important" weights, is more important to the efficiency of the final model, suggesting that pruning may be a useful architecture search paradigm in some cases. Therefore, the researchers proposed a progressive pruning method to search for "thinner" architectures in the modified network. The specific pseudocode process is as follows:
5. Experiments
Newly proposed framework diagram
The number of parameters required for different input channels of different convolution types and the same kernel size
The number of parameters for different convolution types
Kernel size exploration results. Different bars represent different combinations of kernel sizes. For simplicity's sake, only the optimal core size combination is shown in red, as shown in the image below:
Finally, let's take a look at the detection effect:
© THE END
Please contact this official account for authorization for reprinting
The Computer Vision Research Institute Learning Group is waiting for you to join!
ABOUT
Institute of Computer Vision
The Institute of Computer Vision is mainly involved in the field of deep learning, mainly focusing on object detection, object tracking, image segmentation, OCR, model quantization, model deployment and other research directions. The institute shares the latest paper algorithm and new framework every day, provides one-click download of papers, and shares practical projects. The institute mainly focuses on "technical research" and "practical implementation". The institute will share the practice process for different fields, so that everyone can truly experience the real scene of getting rid of theory, and cultivate the habit of loving hands-on programming and thinking with their brains! The Institute of Computer Vision is mainly involved in the field of AI vision and large model research, and is committed to object detection, target classification, image segmentation, OCR, model quantization, model deployment and other directions.
- It is used in the park face human body detection, vehicle license plate detection and recognition, smoke and fireworks detection, smoking and other behavior detection
- It is used in scenic drowning detection, lost child inquiry, vehicle illegal parking detection, garbage overflow detection, fireworks detection, and people flow statistics
- It is used in intrusion detection, perimeter inspection, uniform inspection, product defect detection, product piece counting, AGV automatic handling, etc. in the factory
- It is used in real-time detection of unmanned aerial vehicles in forestry and animal husbandry, livestock counting, forestry area measurement, detection and positioning of dead trees, smoke and fireworks, garbage detection, etc
The institute undertakes a variety of AI vision projects, and there are more areas of business that can be implemented, welcome to consult!
🔗