Receptive Field Block Net for Accurate and Fast Object Detection

Liu, Songtao; Huang, Di; Wang, Yunlong

doi:10.1007/978-3-030-01252-6_24

Cited by 1,216 publications

(741 citation statements)

References 40 publications

Supporting

Mentioning

644

Contrasting

Order By: Relevance

“…Some representative CNN model architectures include AlexNet (Krizhevsky et al, 2012), ZFNet (Zeiler and Fergus, 2014), VGGNet (Simonyan and Zisserman, 2015), GoogLeNet , Inception series (Ioffe and Szegedy, 2015;Szegedy et al, 2017;Szegedy et al, 2016), ResNet , DenseNet (Huang et al, 2017) and SENet (Hu et al, 2018). Also, some researches have been widely explored to further improve the performance of deep learning based methods for object detection, such as feature enhancement (Cai et al, 2016;Cheng et al, 2019;Cheng et al, 2016b;Kong et al, 2016;Liu et al, 2017b), hard negative mining (Lin et al, 2017c;, contextual information fusion (Bell et al, 2016;Gidaris and Komodakis, 2015;Zhu et al, 2015b), modeling object deformations (Mordan et al, 2018;Ouyang et al, 2017;Xu et al, 2017), and so on.…”

Section: Regression-based Methodsmentioning

confidence: 99%

Object detection in optical remote sensing images: A survey and a new benchmark

Wan

Cheng

et al. 2020

ISPRS Journal of Photogrammetry and Remote Sensing

1,137

425

View full text Add to dashboard Cite

Substantial efforts have been devoted more recently to presenting various methods for object detection in optical remote sensing images. However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate. Moreover, most of the existing datasets have some shortcomings, for example, the numbers of images and object categories are small scale, and the image diversity and variations are insufficient. These limitations greatly affect the development of deep learning based object detection methods. In the paper, we provide a comprehensive review of the recent deep learning based object detection progress in both the computer vision and earth observation communities. Then, we propose a large-scale, publicly available benchmark for object DetectIon in Optical Remote sensing images, which we name as DIOR. The dataset contains 23463 images and 192472 instances, covering 20 object classes. The proposed DIOR dataset 1) is large-scale on the object categories, on the object instance number, and on the total image number; 2) has a large range of object size variations, not only in terms of spatial resolutions, but also in the aspect of inter-and intra-class size variability across objects; 3) holds big variations as the images are obtained with different imaging conditions, weathers, seasons, and image quality; and 4) has high inter-class similarity and intra-class diversity. The proposed benchmark can help the researchers to develop and validate their data-driven methods. Finally, we evaluate several state-of-theart approaches on our DIOR dataset to establish a baseline for future research.

show abstract

Section: Regression-based Methodsmentioning

confidence: 99%

Object detection in optical remote sensing images: A survey and a new benchmark

Wan

Cheng

et al. 2020

ISPRS Journal of Photogrammetry and Remote Sensing

1,137

425

View full text Add to dashboard Cite

show abstract

“…According to the theory of Receptive Fields (RFs) in human visual systems [63], [64], the diverse inputs are beneficial to extract distinctive features. However, from Fig.…”

Section: Distinctive Atrous Spatial Pyramid Pooling (Daspp)mentioning

confidence: 99%

“…Firstly, the motivations of vortex pooling and DASPP are different. The proposed DASPP is motivated by the observation that the diverse inputs play an important role in extracting distinctive features [63]. Hence, DASPP takes advantage of the different sizes of pooling operations to generate the diverse inputs.…”

Section: Distinctive Atrous Spatial Pyramid Pooling (Daspp)mentioning

confidence: 99%

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

Dong

Yan

Shen

et al. 2021

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

Deep Convolutional Neural Networks (DCNNs) have recently shown outstanding performance in semantic image segmentation. However, state-of-the-art DCNN-based semantic segmentation methods usually suffer from high computational complexity due to the use of complex network architectures. This greatly limits their applications in the real-world scenarios that require real-time processing. In this paper, we propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes, which achieves a good trade-off between accuracy and speed. Specifically, a Lightweight Baseline Network with Atrous convolution and Attention (LBN-AA) is firstly used as our baseline network to efficiently obtain dense feature maps. Then, the Distinctive Atrous Spatial Pyramid Pooling (DASPP), which exploits the different sizes of pooling operations to encode the rich and distinctive semantic information, is developed to detect objects at multiple scales. Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional layers is designed to generate highresolution feature maps preserving the detailed spatial information. Finally, a simple but practical Feature Fusion Network (FFN) is used to effectively combine both shallow and deep features from the semantic branch (DASPP) and the spatial branch (SPN), respectively. Extensive experimental results show that the proposed method respectively achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps on the challenging Cityscapes and CamVid test datasets (by only using a single NVIDIA TITAN X card). This demonstrates that the proposed method offers excellent performance at the real-time speed for semantic segmentation of urban street scenes.

show abstract

“…The authors of [18] point out that one-stage detectors suffer from the class imbalance problem between foregrounds and backgrounds, and propose focal loss which focuses on hard examples rather than easy ones. Furthermore, recent studies [35,20,39] have improved the performance both in accuracy and inference speed maintaining the efficiency of one-stage detectors.…”

Section: Object Detectionmentioning

confidence: 99%

Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection

Kim

Choi

Kim

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

195

116

View full text Add to dashboard Cite

Deep learning-based object detectors have shown remarkable improvements. However, supervised learningbased methods perform poorly when the train data and the test data have different distributions. To address the issue, domain adaptation transfers knowledge from the labelsufficient domain (source domain) to the label-scarce domain (target domain). Self-training is one of the powerful ways to achieve domain adaptation since it helps classwise domain adaptation. Unfortunately, a naive approach that utilizes pseudo-labels as ground-truth degenerates the performance due to incorrect pseudo-labels. In this paper, we introduce a weak self-training (WST) method and adversarial background score regularization (BSR) for domain adaptive one-stage object detection. WST diminishes the adverse effects of inaccurate pseudo-labels to stabilize the learning procedure. BSR helps the network extract discriminative features for target backgrounds to reduce the domain shift. Two components are complementary to each other as BSR enhances discrimination between foregrounds and backgrounds, whereas WST strengthen class-wise discrimination. Experimental results show that our approach effectively improves the performance of the one-stage object detection in unsupervised domain adaptation setting.Unfortunately, domain adaptive object detection has received less attention in contrast to classification [22, 10, 33, arXiv:1909.00597v1 [cs.CV]

show abstract

Receptive Field Block Net for Accurate and Fast Object Detection

Cited by 1,216 publications

References 40 publications

Object detection in optical remote sensing images: A survey and a new benchmark

Object detection in optical remote sensing images: A survey and a new benchmark

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection

Contact Info

Product

Resources

About