YoloCS: Effectively Reducing the Spatial Complexity of Feature Maps (with Paper Download)

Follow and star

Never get lost

Institute of Computer Vision

YoloCS: Effectively Reducing the Spatial Complexity of Feature Maps (with Paper Download)

Computer Vision Research Institute

Scan the QR code on the homepage to get how to join

论文地址：YOLOCS: Object Detection based on Dense Channel Compression for Feature Spatial Solidification (arxiv.org)

Special column of the Institute of Computer Vision

Column of Computer Vision Institute

By compressing the spatial resolution of the feature map, the accuracy and speed of object detection are improved. The main contribution of this paper is to introduce a new feature space solidification method, which can effectively reduce the spatiotemporal complexity of feature maps and improve the efficiency and accuracy of object detection.

Summary

In today's sharing, the researchers examine the association between channel features and convolution kernels during feature purification and gradient backpropagation, with a focus on forward and backpropagation within the network. Therefore, the researchers propose a feature space solidification method called dense channel compression. According to the core concept of the method, two innovative modules for backbone networks and head networks are introduced: Dense Channel Compression (DCFS) for feature space solidified structures and asymmetric multi-level compression decoupled head (ADH). When integrated into the YOLOv5 model, these two modules exhibit extraordinary performance, resulting in an improved model known as YOLOCS.

The APs of the large, medium and small YOLOCS models were 50.1%, 47.6% and 42.5%, respectively. While maintaining a significant similarity to the inference speed of the YOLOv5 model, the large, medium, and small YOLOCS models outperformed the APs of YOLOv5 by 1.1%, 2.3%, and 5.2%, respectively.

Background

In recent years, object detection technology has received extensive attention in the field of computer vision. Among them, Single Shot Multi Box Detector (SSD) and Convolutional Neural Networks (CNN) are the two most commonly used object detection technologies. However, due to the low accuracy of the single-shot multi-frame algorithm and the high computational complexity of the object detection technology based on convolutional neural network, the search for an efficient and high-precision object detection technology has become one of the current research hotspots.

Dense Channel Compression (DCC) is a new type of convolutional neural network compression technology, which realizes the compression and acceleration of network parameters by spatially solidifying the feature map in the convolutional neural network. However, the application of DCC technology in the field of object detection has not been fully studied.

Therefore, an object detection technology based on Dense Channel Compression was proposed, named YOLOCS (YOLO with Dense Channel Compression). YOLOCS technology combines DCC technology with YOLO (You Only Look Once) algorithm to achieve efficient and high-precision processing of object detection. Specifically, the YOLOCS technology uses DCC technology to spatially solidify the feature map, so as to achieve accurate positioning of the target position. At the same time, the YOLOCS technology uses the characteristics of the single-shot multi-frame algorithm of the YOLO algorithm to realize the rapid calculation of the target category classification.

New framework

Dense Channel Compression for Feature Spatial Solidification Structure (DCFS)

In the proposed method (figure (c) above), the researchers not only solved the balance between the width and depth of the network, but also compressed the features from different depth layers through 3×3 convolution, reducing the number of channels by half before outputting and fusing the features. This approach enables researchers to refine the feature output from different layers to a greater extent, thereby enhancing the diversity and validity of features in the fusion stage.

In addition, the compressed features from each layer carry a larger convolutional kernel weight (3×3), effectively extending the receptive field of the output features. This method is referred to as dense channel compression for feature space curing. The rationale behind dense channel compression for feature space solidification relies on the utilization of a larger convolutional kernel to facilitate channel compression. This technique has two key advantages: first, it expands the receptive domain of feature perception during forward propagation, thus ensuring that region-relevant feature details are incorporated to minimize feature loss throughout the compression phase. Second, the enhancement of error details during error backpropagation allows for more accurate weight adjustments.

To further illustrate these two advantages, two channels are compressed using a convolution with two different kernel types (1×1 and 3×3), as shown in the figure below:

The following figure shows the network structure of DCFS. The three-layer bottleneck structure is adopted to gradually compress the channel in the process of forward propagation of the network. Half-channel 3×3 convolutions are applied to all branches, followed by batch normalization (BN) and activation function layers. Subsequently, a 1×1 convolutional layer is used to compress the output feature channels to match the input feature channels.

Asymmetric Multi-level Channel Compression Decoupled Head (ADH)

In order to solve the problem of decoupling heads in the YOLOX model, researchers have conducted a series of studies and experiments. The results reveal the logical correlation between the utilization of the decoupled head structure and the correlation loss function. Specifically, for different tasks, the structure of the decoupling head should be adjusted according to the complexity of the loss calculation. In addition, when the decoupled head structure is applied to various tasks, directly compressing the feature channels of the previous layer (as shown in the figure below) into task channels may lead to significant feature loss due to the difference in the final output dimensions. This, in turn, can adversely affect the overall performance of the model.

In addition, when considering the proposed dense channel compression method for feature space solidification, directly reducing the number of channels in the final layer to match the output channels may lead to feature loss during forward propagation, thereby degrading network performance. At the same time, in the context of backpropagation, this structure may lead to suboptimal error backpropagation, hindering the realization of gradient stability. To address these challenges, a new decoupling head, called an asymmetric multi-stage channel compression decoupling head, has been introduced (Figure (b) below).

Specifically, the researchers deepened the network path dedicated to the target scoring task and used three convolutions to expand the receptive field and number of parameters for the task. At the same time, the features of each convolutional layer are compressed along the channel dimension. This method not only effectively reduces the training difficulty related to the target scoring task and improves the model performance, but also greatly reduces the parameters and GFLOPs of the decoupling head module, thereby significantly improving the inference speed. In addition, 1 convolutional layer is used to separate the classification and bounding box tasks. This is because for matched positive samples, the loss associated with both tasks is relatively small, thus avoiding overscaling. This approach greatly reduces the parameters and GFLOPs in the decoupling head, ultimately increasing the inference speed.

Experiment visualization

Ablation Experiment on MS-COCO val2017

Comparison of YOLOCS, YOLOX and YOLOv5- r6.1[7] in terms of AP on MS-COCO 2017 test-dev

Please contact this official account for authorization for reprinting

The Computer Vision Research Institute Learning Group is waiting for you to join!

ABOUT

Institute of Computer Vision

The Institute of Computer Vision is mainly involved in the field of deep learning, mainly committed to object detection, object tracking, image segmentation and other research directions. The institute always shares the latest paper algorithm framework, and the platform focuses on "research" and "practice". In the later stage, the practice process will be shared for the corresponding fields, so that everyone can truly experience the real scene of getting rid of the theory, and cultivate the habit of loving hands-on programming and thinking with their brains!

YoloCS: Effectively Reducing the Spatial Complexity of Feature Maps (with Paper Download)

Read on

Combination of Traditional and Deep Learning to Maximize the Accuracy of Face Detection (with paper download)

Yang Mi's essay is a counterattack to questions about his acting skills? Is she justifying herself?

Qin Xue's hard practice (3): Writing strategies for each section of the paper

Sunday Meditation (161): Research Objects and Keyword Definitions for Innovative Reproduction Papers

Sunday Meditation (160): Journal Paper 5 Concluding Remarks

Sunday Meditation (159): Doctoral Dissertation 2.3 Research on Channel Selection

Unaffected by the controversy of the paper, Yang Mi is playful and smart in her photos, and her skin is white and beautiful

Difficulty in choosing a topic for a new biography paper? Look at these psychological theories how about this cut!

The Seven Dimensions of the Dissertation Research Validity Check

People just graduated, subverting the entire AI world: picking up Sora's two leaders' doctoral dissertations

Multi-Graph Examples: How to present various graphs in the results of an essay

Why is the miracle drug for weight loss effective? Latest Science paper: Trick the brain into making you full before you have eaten

Write moral education papers, participate in moral education defense, and be included in the awards...... How to assess the effectiveness of moral education

College teachers evaluate their titles, and senior leaders' instructions can replace top journal papers, and even discount several papers

The study found that in China, the number of papers published increased significantly after becoming the dean of a major university