Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

Welcome to the "Computer Vision Research Institute"

Special column of the Institute of Computer Vision

Author: Edison_G

When the conventional deep neural network to spiking neural network conversion method is applied to the spiking neural network domain, the performance is greatly reduced, and the possible explanations are proposed after in-depth analysis: one is the low efficiency from layer-by-layer normalization, and the other is the lack of representation of negative activation for the leaky-ReLU function.

1. Preface

Over the past decade, deep neural networks (DNNs) have demonstrated significant performance in a variety of applications. As we try to solve tougher and most up-to-date problems, it has become inevitable that the demand for computing and power resources will increase.

Spiking neural networks(SNNs)作为第三代神经网络，由于其事件驱动（event-driven）和低功率特性，引起了广泛的兴趣。

However, SNNs are difficult to train, mainly because of their neuronal complex dynamics and indifferentiable spike operations. In addition, their applications are limited to relatively simple tasks such as image classification.

In today's sharing, the authors examine SNNs in a more challenging regression problem (i.e., object detection). Through in-depth analysis, two new methods are introduced: channel-wise normalization and signed neuron with imbalanced threshold, both of which provide fast information transmission for deep SNNs. Therefore, the first spike-based object detection model, called Spiking-YOLO, was proposed.

2. New framework contributions

Despite the many benefits of SNNs, they can only handle relatively simple tasks at present, and due to the complex dynamics of neurons and the unconductable manipulation, there is no scalable training method for the time being. DNN-to-SNN is a recent SNN training method, which converts the intermediate DNN network of the target DNN into SNN for training, and then converts it into SNN and reuses its training parameters, which can achieve good performance in the classification of small datasets, but the classification results on large datasets are not ideal

This paper intends to use the DNN-to-SNN transformation method to apply SNN to a more complex field of object detection, where image classification only needs to select the classification, while object detection requires a neural network to perform highly accurate number prediction, which is much more difficult. After in-depth analysis, the paper mainly faces the following two problems in realizing the conversion of YOLO:

The commonly used SNN normalization method is too inefficient, resulting in a low pulse emission frequency. Since SNN needs to set a threshold for pulse transmission, the weight should be normalized, which is conducive to the setting of the threshold, and the commonly used SNN normalization method is too inefficient in target detection, which will be explained in detail later
In the SNN field, there is no implementation of efficient leaky-ReLU because to convert YOLO to SNN, YOLO contains a large number of leaky-ReLU, which is an important structure, but there is no efficient conversion method at present

A relatively unpopular or cutting-edge article from South Korea, the research direction is the fusion of spiking neural networks and YOLO algorithms, and it is found that Koreans are particularly good at making magic changes to YOLO, SSD and other algorithms.

In this paper, a Spiking-YOLO algorithm for fast and accurate information transmission in deep SNN was proposed. This is the first time that deep SNN has been successfully applied to target detection tasks;
A fine normalization technique for deep SNN is developed to become channel-by-channel normalization. The proposed method makes it possible to have a higher emissivity in multiple neurons, which facilitates fast and accurate information transmission.
A novel method is proposed, characterized by signed neurons with imbalance thresholds, which enable leakyReLU in SNNs. This creates opportunities for deep SNN to be applied to a variety of models and applications.

3. New framework

Channel-wise data-based normalization

In SNN, it is extremely important to generate a pulse train based on the amplitude of the input for lossless content transfer. However, at a fixed time, over- or under-activation within neurons may result in loss of content, depending on the setting of the critical voltage. If the setting is too high, the neurons need to accumulate voltage for a long time to emit pulses, and vice versa, they will emit too many pulses. The emission frequency is usually defined as the total number of pulses emitted by each timestep, and the maximum emissivity is 100%, that is, each timestep emits pulses:

To prevent both over- and under-activation of neurons, both the weights and the critical voltages need to be carefully selected. For this reason, many studies have proposed normalization methods, such as the commonly used Layer-wise normalization (layer-norm). In this method, the weights of the layer are normalized by the maximum activation value of the layer, as shown in the above formula, and the sum is the weight, which is the maximum value of the output feature map. After normalization, the output of the neuron is normalized to facilitate the setting of the critical voltage. Since the maximum activation value is obtained from the training set, the test set and the training set need to have the same distribution, but the paper experiments show that this conventional normalization method will lead to significant performance degradation on the object detection task.

Data-based channel-by-channel normalization

In traditional spiking neural networks, it is necessary to ensure that neurons generate pulse sequences based on their input size, where the weights and threshold voltages are responsible for the adequacy and balance of neuronal activation, respectively, which can lead to either underactivation or overactivation, resulting in information loss and poor performance.

The authors analyzed and demonstrated in depth that fine-grained channel regularization can increase the emissivity of neurons with minimal activation. In other words, very small activations, properly normalized, will accurately transmit information in less time. This paper argues that the application of channel regularization can bring faster and more accurate deep SNN, which will make it possible to apply deep SNN to more advanced machine learning problems.

The figure above shows the maximum activation value of each channel in each layer after passing through the layer-norm, and the blue and red lines are the average and minimum activation values of each layer, respectively. It can be seen that the normalized activation value of each layer is biased greatly, and in general, the layer-norm makes the channel bias of neurons underactivated, which is not detected in the image classification task that only needs to select the classification, but the regression of the detection task that needs to predict the accurate value is different. For example, if 0.7 is passed, 10 timestep pulses need to be pulsed 7 times, and 0.007 needs to be pulsed 7 times at 1000 timestep. When the tempstep itself is scarce, an emissivity that is too low can lead to loss of information due to not emitting enough pulses. Proposed normalization method

The whole process is as follows:

As shown in the diagram above and the algorithm, the channel-wise normalization method can eliminate the problem of very small activation values, i.e., obtain a higher but appropriate transmit frequency and accurately transmit information in a short period of time.

Signed neuron featuring imbalanced threshold

Signed neurons with imbalance threshold features

A signed neuron with an imbalance threshold (i.e., IBT) is introduced, which can not only explain the positive and negative activations, but also compensate for the leaky term in the region of the negative activation value of leakyReLU. As shown in the figure below, the authors added another Vth responsible for responding to negative activations.

Among them, the basic dynamics formula for signed neurons with IBT is shown below.

By using the above-mentioned signed neurons with IBT, leakyReLU can be used in SNNs, thereby converting various DNN models into widely used SNNs.

4. Experimental results and evaluation

The authors used Tiny YOLO's real-time object detection model to achieve the maximum pooling layer and BN layer in the spiking neural network. Models were trained on PASCAL VOC2007 and 2012 and tested on PASCAL VOC2007. All code is based on the Tensorflow Eeager framework and is experimented on a V100 GPU.

In this paper, the experimental design verifies and analyzes the usefulness of IBT using channel regularization and the presence of signed neurons. As shown in the figure below, when both channel regularization and signed neurons are used, pulse-YOLO can achieve 51.61% mAP, which is relatively high.

In addition, the layer-by-layer regularization of mAP is only 46.98%, while the channel regularization has obvious advantages and faster convergence speed. If the two methods proposed in this paper are not used, the target cannot be detected by Spiking-YOLO, and the mAP is only 7.3% if only signed neurons are used, indicating that signed neurons can compensate for the shortcomings of leakyReLU and play a key role in solving this high numerical accuracy problem in deep SNNs.

In the figure above, the authors also perform additional comparative experiments on two output encoding schemes, one based on cumulative Vth and one based on the number of pulses. Experimental results show that the output coding scheme based on Vth will be more accurate in interpreting the spike sequence, and it also reflects the characteristics of faster convergence.

The purpose of the experiment was to transfer Tiny-YOLO to SNN losslessly, and the results are shown in the figure above, using channel-norm and IBT to effectively improve performance and use less timestep.

The authors tried different decoding methods, namely the membrane voltage and the number of pulses, because the remainder of the pulse number had to be discarded, which would bring errors and information loss, so it would be more accurate to decompress based on the membrane voltage.

5. Summary

In today's sharing, the authors propose Spiking-YOLO, the first SNN model, to successfully perform object detection by obtaining similar results to the original DNN on non-trivial datasets, PASCALVOC, and MSCO.

In my opinion, this research represents the first step in solving more advanced machine learning problems in deep SNNs.

Please contact this official account for authorization for reprinting

The Computer Vision Research Institute Learning Group is waiting for you to join!

ABOUT

Institute of Computer Vision

The Institute of Computer Vision is mainly involved in the field of deep learning, mainly focusing on object detection, object tracking, image segmentation, OCR, model quantization, model deployment and other research directions. The institute shares the latest paper algorithm and new framework every day, provides one-click download of papers, and shares practical projects. The institute mainly focuses on "technical research" and "practical implementation". The institute will share the practice process for different fields, so that everyone can truly experience the real scene of getting rid of theory, and cultivate the habit of loving hands-on programming and thinking with their brains!

🔗

Institute of Computer Vision

回复"SYolo"获取论文

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

Channel-wise data-based normalization

Signed neuron featuring imbalanced threshold

Read on

Water quality testing data announcement in May!

Three Kingdoms: The best way to detect a player's skills is actual combat, and actual combat depends on ranking

What is a short essay? What is the difference with a thesis?

A detailed explanation of the technology of sleeping post detection using visual analysis

In-depth analysis of the paper: a comprehensive review and review of the automatic construction of knowledge graphs

China has more AIDS patients than the United States, is it a mistake to cancel entry testing?

"Test" consumer-level health testing service takes care of users' health

On the day the paper was published in Nature, investors from many countries sent him emails

After becoming the dean of a key university, the number of papers published increased significantly, and decreased significantly after leaving office

DEKRA Greater Bay Area New Energy Testing and Certification Center was officially completed to promote the innovative and high-quality development of the new energy industry

Is a thesis a title in the way? Publish a guide to help you break the game!

An Ling Xueji (159) - Intensive Master's Thesis 1.1 Research Background

An Ling's Studies(160)——Intensive Reading of Journal Papers 6.Models and Methods (1)

Promote the high-quality cluster development of the smart metering industry Zhu Jian investigated the city's measurement and testing work

Four manifestations and countermeasures of the paper "buttoning the big hat".

Paper Publication丨"Finance and Trade Economics" 2024 key topic selection direction in the field of trade

A large number of condiments have been detected as carcinogens? Pay attention to these 3 seasonings, eat too much and get cancer!

What are the better SCI paper editing companies?

Which one is stronger in Chinese artificial intelligence? Youbo paper ranking: Nanjing University ranked second, and Peking University University tied for sixth

Netizens found that the vegetable base for Hong Kong has higher food safety, and 6 procedures are required to enter the port

Behind the 2.78 million AI papers, what contributions have Chinese researchers made?

Title Review Reform: Papers are no longer unique, but still key!

The deadline for oral presentations, posters and abstracts of the 2024 (6th) Antimicrobial Science and Technology Forum has been extended to July 12

The application of oxygen transmittance tester in packaging inspection is of great significance

Zhe Xue (15): Introduction to Intensive Reading of English Papers (2)

Zhe Xue (16): Introduction to Intensive Reading of Doctoral Dissertation (2)

Market overview analysis and investment prospect analysis report of China's semiconductor third-party laboratory testing industry

Android Native内存泄漏检测方案详解

Fat Donglai Central Kitchen exposed! The chef wears protective clothing to cook, and the food is tested one by one, and the whole process is monitored

China Wheel Inspection Robot Market Report, and Market Forecast

The rotor of the multi-pole motor measures the magnetic flux in batches, and the upper and lower limits of the flux meter can be set

2024 global and Chinese membrane filtration pharmaceutical sterility testing industry data prospect forecast analysis