ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices

Qin, Zheng; Li, Zeming; Zhang, Zhaoning; Bao, Yiping; Yu, Gang; Peng, Yuxing; Sun, Jian

doi:10.1109/iccv.2019.00682

Cited by 232 publications

(142 citation statements)

References 31 publications

Supporting

Mentioning

123

Contrasting

Order By: Relevance

“…This is because these two detection heads are some of the first and seminal works on end-to-end trainable detection heads from the R-CNN family and do not require multi-stage progressive training such as R-CNN [27] and Fast R-CNN [28]. Furthermore, prior research [29], [37], [38] has demonstrated that region-proposal based detection heads are typically more accurate than unified framework based detection heads.…”

Section: Choice Of Object Detection Headsmentioning

confidence: 99%

Backward Compatible Object Detection Using HDR Image Content

et al. 2020

View full text Add to dashboard Cite

Convolution Neural Network (CNN)-based object detection models have achieved unprecedented accuracy in challenging detection tasks. However, existing detection models (detection heads) trained on 8-bits/pixel/channel low dynamic range (LDR) images are unable to detect relevant objects under lighting conditions where a portion of the image is either underexposed or overexposed. Although this issue can be addressed by introducing High Dynamic Range (HDR) content and training existing detection heads on HDR content, there are several major challenges, such as the lack of real-life annotated HDR dataset(s) and extensive computational resources required for training and the hyper-parameter search. In this paper, we introduce an alternative backwards-compatible methodology to detect objects in challenging lighting conditions using existing CNN-based detection heads. This approach facilitates the use of HDR imaging without the immediate need for creating annotated HDR datasets and the associated expensive retraining procedure. The proposed approach uses HDR imaging to capture relevant details in high contrast scenarios. Subsequently, the scene dynamic range and wider colour gamut are compressed using HDR to LDR mapping techniques such that the salient highlight, shadow, and chroma details are preserved. The mapped LDR image can then be used by existing pre-trained models to extract relevant features required to detect objects in both the underexposed and overexposed regions of a scene. In addition, we also conduct an evaluation to study the feasibility of using existing HDR to LDR mapping techniques with existing detection heads trained on standard detection datasets such as PASCAL VOC and MSCOCO. Results show that the images obtained from the mapping techniques are suitable for object detection, and some of them can significantly outperform traditional LDR images.

show abstract

Section: Choice Of Object Detection Headsmentioning

confidence: 99%

Backward Compatible Object Detection Using HDR Image Content

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Moreover, it consists of a Context Enhancement Module (CEM) and a Mobile Spatial Attention Module (MSAM). The key idea of CEM that leverages semantic and context information from multiple scales is to aggregate multi-scale local and global information to produce more discriminating features and the receptive field size plays an important role in CNN models [32]. CNNs can only capture information inside the receptive field.…”

Section: Plos Onementioning

confidence: 99%

3D car-detection based on a Mobile Deep Sensor Fusion Model and real-scene applications

Zhang

et al. 2020

PLoS ONE

View full text Add to dashboard Cite

Unmanned vehicles need to make a comprehensive perception of the surrounding environmental information during driving. Perception of automotive information is of significance. In the field of automotive perception, the sterevision of car-detection plays a vital role and sterevision can calculate the length, width, and height of a car, making the car more specific. However, under the existing technology, it is impossible to obtain accurate detection in a complex environment by relying on a single sensor. Therefore, it is particularly important to study the complex sensing technology based on multi-sensor fusion. Recently, with the development of deep learning in the field of vision, a mobile sensor-fusion method based on deep learning is proposed and applied in this paper-Mobile Deep Sensor Fusion Model (MDSFM). The content of this article is as follows. It does a data processing that projects 3D data to 2D data, which can form a dataset suitable for the model, thereby training data more efficiently. In the modules of LiDAR, it uses a revised squeezeNet structure to lighten the model and reduce parameters. In the modules of cameras, it uses the improved design of detecting module in R-CNN with a Mobile Spatial Attention Module (MSAM). In the fused part, it uses a dual-view deep fusing structure. And then it selects images from the KITTI's datasets for validation to test this model. Compared with other recognized methods, it shows that our model has a fairly good performance. Finally, it implements a ROS program on the experimental car and our model is in good condition. The result shows that it can improve performance of detecting easy cars significantly through MDSFM. It increases the quality of the detected data and improves the generalized ability of car-detection model. It improves contextual relevance and preserves background information. It remains stable in driverless environments. It is applied in the realistic scenario and proves that the model has a good practical value.

show abstract

“…In terms of simplifying neural networks, Mobilenet [23] is a lightweight neural network proposed by Google for mobile devices, which effectively reduces the amount of parameters and calculations. ShuffleNet [24], PeleeNet [25] and ThunderNet [26] enable the network model to be further optimized and become smaller and faster.…”

Section: Image Classification Networkmentioning

confidence: 99%

Robust Image Classification with Cognitive-Driven Color Priors

Zhu

Lan

et al. 2020

Electronics

View full text Add to dashboard Cite

Existing image classification methods based on convolutional neural networks usually use a large number of samples to learn classification features hierarchically, causing the problems of over-fitting and error propagation layer by layer. Thus, they are vulnerable to adversarial samples generated by adding imperceptible disturbances to input samples. To address the above issue, we propose a cognitive-driven color prior model to memorize the color attributes of target samples inspired by the characteristics of human memory. At inference stage, color priors are indexed from the memory and fused with features of convolutional neural networks to achieve robust image classification. The proposed color prior model is cognitive-driven and has no training parameters, thus it has strong generalization and can effectively defend against adversarial samples. In addition, our method directly combines the features of the prior model with the classification probability of the convolutional neural network, without changing the network structure and its parameters of the existing algorithm. It can be combined with other adversarial attack defense methods, such as various preprocessing modules such as PixelDefense or adversarial training methods, to improve the robustness of image classification. Experiments on several benchmark datasets show that the proposed method improves the anti-interference ability of image classification algorithms.

show abstract

ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices

Cited by 232 publications

References 31 publications

Backward Compatible Object Detection Using HDR Image Content

Backward Compatible Object Detection Using HDR Image Content

3D car-detection based on a Mobile Deep Sensor Fusion Model and real-scene applications

Robust Image Classification with Cognitive-Driven Color Priors

Contact Info

Product

Resources

About