laitimes

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

author:Institute of Computer Vision

Welcome to the "Computer Vision Research Institute"

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

Special column of the Institute of Computer Vision

Author: Edison_G

When the conventional deep neural network to spiking neural network conversion method is applied to the spiking neural network domain, the performance is greatly reduced, and the possible explanations are proposed after in-depth analysis: one is the low efficiency from layer-by-layer normalization, and the other is the lack of representation of negative activation for the leaky-ReLU function.
Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

1. Preface

Over the past decade, deep neural networks (DNNs) have demonstrated significant performance in a variety of applications. As we try to solve tougher and most up-to-date problems, it has become inevitable that the demand for computing and power resources will increase.

Spiking neural networks(SNNs)作为第三代神经网络,由于其事件驱动(event-driven)和低功率特性,引起了广泛的兴趣。

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

However, SNNs are difficult to train, mainly because of their neuronal complex dynamics and indifferentiable spike operations. In addition, their applications are limited to relatively simple tasks such as image classification.

In today's sharing, the authors examine SNNs in a more challenging regression problem (i.e., object detection). Through in-depth analysis, two new methods are introduced: channel-wise normalization and signed neuron with imbalanced threshold, both of which provide fast information transmission for deep SNNs. Therefore, the first spike-based object detection model, called Spiking-YOLO, was proposed.

2. New framework contributions

Despite the many benefits of SNNs, they can only handle relatively simple tasks at present, and due to the complex dynamics of neurons and the unconductable manipulation, there is no scalable training method for the time being. DNN-to-SNN is a recent SNN training method, which converts the intermediate DNN network of the target DNN into SNN for training, and then converts it into SNN and reuses its training parameters, which can achieve good performance in the classification of small datasets, but the classification results on large datasets are not ideal

This paper intends to use the DNN-to-SNN transformation method to apply SNN to a more complex field of object detection, where image classification only needs to select the classification, while object detection requires a neural network to perform highly accurate number prediction, which is much more difficult. After in-depth analysis, the paper mainly faces the following two problems in realizing the conversion of YOLO:

  • The commonly used SNN normalization method is too inefficient, resulting in a low pulse emission frequency. Since SNN needs to set a threshold for pulse transmission, the weight should be normalized, which is conducive to the setting of the threshold, and the commonly used SNN normalization method is too inefficient in target detection, which will be explained in detail later
  • In the SNN field, there is no implementation of efficient leaky-ReLU because to convert YOLO to SNN, YOLO contains a large number of leaky-ReLU, which is an important structure, but there is no efficient conversion method at present

A relatively unpopular or cutting-edge article from South Korea, the research direction is the fusion of spiking neural networks and YOLO algorithms, and it is found that Koreans are particularly good at making magic changes to YOLO, SSD and other algorithms.

  • In this paper, a Spiking-YOLO algorithm for fast and accurate information transmission in deep SNN was proposed. This is the first time that deep SNN has been successfully applied to target detection tasks;
  • A fine normalization technique for deep SNN is developed to become channel-by-channel normalization. The proposed method makes it possible to have a higher emissivity in multiple neurons, which facilitates fast and accurate information transmission.
  • A novel method is proposed, characterized by signed neurons with imbalance thresholds, which enable leakyReLU in SNNs. This creates opportunities for deep SNN to be applied to a variety of models and applications.

3. New framework

Channel-wise data-based normalization

In SNN, it is extremely important to generate a pulse train based on the amplitude of the input for lossless content transfer. However, at a fixed time, over- or under-activation within neurons may result in loss of content, depending on the setting of the critical voltage. If the setting is too high, the neurons need to accumulate voltage for a long time to emit pulses, and vice versa, they will emit too many pulses. The emission frequency is usually defined as the total number of pulses emitted by each timestep, and the maximum emissivity is 100%, that is, each timestep emits pulses:

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

To prevent both over- and under-activation of neurons, both the weights and the critical voltages need to be carefully selected. For this reason, many studies have proposed normalization methods, such as the commonly used Layer-wise normalization (layer-norm). In this method, the weights of the layer are normalized by the maximum activation value of the layer, as shown in the above formula, and the sum is the weight, which is the maximum value of the output feature map. After normalization, the output of the neuron is normalized to facilitate the setting of the critical voltage. Since the maximum activation value is obtained from the training set, the test set and the training set need to have the same distribution, but the paper experiments show that this conventional normalization method will lead to significant performance degradation on the object detection task.

Data-based channel-by-channel normalization

In traditional spiking neural networks, it is necessary to ensure that neurons generate pulse sequences based on their input size, where the weights and threshold voltages are responsible for the adequacy and balance of neuronal activation, respectively, which can lead to either underactivation or overactivation, resulting in information loss and poor performance.

The authors analyzed and demonstrated in depth that fine-grained channel regularization can increase the emissivity of neurons with minimal activation. In other words, very small activations, properly normalized, will accurately transmit information in less time. This paper argues that the application of channel regularization can bring faster and more accurate deep SNN, which will make it possible to apply deep SNN to more advanced machine learning problems.

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

The figure above shows the maximum activation value of each channel in each layer after passing through the layer-norm, and the blue and red lines are the average and minimum activation values of each layer, respectively. It can be seen that the normalized activation value of each layer is biased greatly, and in general, the layer-norm makes the channel bias of neurons underactivated, which is not detected in the image classification task that only needs to select the classification, but the regression of the detection task that needs to predict the accurate value is different. For example, if 0.7 is passed, 10 timestep pulses need to be pulsed 7 times, and 0.007 needs to be pulsed 7 times at 1000 timestep. When the tempstep itself is scarce, an emissivity that is too low can lead to loss of information due to not emitting enough pulses. Proposed normalization method

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

The whole process is as follows:

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

As shown in the diagram above and the algorithm, the channel-wise normalization method can eliminate the problem of very small activation values, i.e., obtain a higher but appropriate transmit frequency and accurately transmit information in a short period of time.

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)
Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)
Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

Signed neuron featuring imbalanced threshold

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

Signed neurons with imbalance threshold features

A signed neuron with an imbalance threshold (i.e., IBT) is introduced, which can not only explain the positive and negative activations, but also compensate for the leaky term in the region of the negative activation value of leakyReLU. As shown in the figure below, the authors added another Vth responsible for responding to negative activations.

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

Among them, the basic dynamics formula for signed neurons with IBT is shown below.

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

By using the above-mentioned signed neurons with IBT, leakyReLU can be used in SNNs, thereby converting various DNN models into widely used SNNs.

4. Experimental results and evaluation

The authors used Tiny YOLO's real-time object detection model to achieve the maximum pooling layer and BN layer in the spiking neural network. Models were trained on PASCAL VOC2007 and 2012 and tested on PASCAL VOC2007. All code is based on the Tensorflow Eeager framework and is experimented on a V100 GPU.

In this paper, the experimental design verifies and analyzes the usefulness of IBT using channel regularization and the presence of signed neurons. As shown in the figure below, when both channel regularization and signed neurons are used, pulse-YOLO can achieve 51.61% mAP, which is relatively high.

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

In addition, the layer-by-layer regularization of mAP is only 46.98%, while the channel regularization has obvious advantages and faster convergence speed. If the two methods proposed in this paper are not used, the target cannot be detected by Spiking-YOLO, and the mAP is only 7.3% if only signed neurons are used, indicating that signed neurons can compensate for the shortcomings of leakyReLU and play a key role in solving this high numerical accuracy problem in deep SNNs.

In the figure above, the authors also perform additional comparative experiments on two output encoding schemes, one based on cumulative Vth and one based on the number of pulses. Experimental results show that the output coding scheme based on Vth will be more accurate in interpreting the spike sequence, and it also reflects the characteristics of faster convergence.

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

The purpose of the experiment was to transfer Tiny-YOLO to SNN losslessly, and the results are shown in the figure above, using channel-norm and IBT to effectively improve performance and use less timestep.

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

The authors tried different decoding methods, namely the membrane voltage and the number of pulses, because the remainder of the pulse number had to be discarded, which would bring errors and information loss, so it would be more accurate to decompress based on the membrane voltage.

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)
Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)
Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

5. Summary

In today's sharing, the authors propose Spiking-YOLO, the first SNN model, to successfully perform object detection by obtaining similar results to the original DNN on non-trivial datasets, PASCALVOC, and MSCO.

In my opinion, this research represents the first step in solving more advanced machine learning problems in deep SNNs.

© THE END

Please contact this official account for authorization for reprinting

Overhauled Yolo Framework | A New Framework for Object Detection with Very Low Energy Consumption (with Paper Download)

The Computer Vision Research Institute Learning Group is waiting for you to join!

ABOUT

Institute of Computer Vision

The Institute of Computer Vision is mainly involved in the field of deep learning, mainly focusing on object detection, object tracking, image segmentation, OCR, model quantization, model deployment and other research directions. The institute shares the latest paper algorithm and new framework every day, provides one-click download of papers, and shares practical projects. The institute mainly focuses on "technical research" and "practical implementation". The institute will share the practice process for different fields, so that everyone can truly experience the real scene of getting rid of theory, and cultivate the habit of loving hands-on programming and thinking with their brains!

🔗

Institute of Computer Vision

回复"SYolo"获取论文

Read on