A Single Shot Framework with Multi-Scale Feature Fusion for Geospatial Object Detection

Zhuang, Shuo; Wang, Ping; Jiang, Boran; Wang, Gang; Wang, Cong

doi:10.3390/rs11050594

Cited by 35 publications

(38 citation statements)

References 44 publications

(61 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The atrous filter enlarges the FOV without increasing the number of parameters to be calculated, and thus saves computational resources. Additionally, high-level features containing semantic information and low level features containing fine details are fused together through up-sampling and concatenation [28] so that different layers of feature are both considered for detection tasks, especially for small object detection. In a similar model [29], the deep residual network (ResNet) is used as the encoder, and high level features are combined with corresponding low-level features as the up-sampling stage decoder.…”

Section: Multi-scalementioning

confidence: 99%

“…The multi-scale concept in this study refers to the relationship between the local and the global, that is, a small local area and a large area within a certain neighborhood. However, the current multi-scale research focuses on the multi-scale feature extraction and fusion of the same training sample image [24][25][26][27][28][29][30][31]. There are few studies on how to consider the semantic analysis of variable regions in different spatial extents [32][33][34][35].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Scale Remote Sensing Semantic Analysis Based on a Global Perspective

Cui

Zhang

et al. 2019

IJGI

View full text Add to dashboard Cite

Remote sensing image captioning involves remote sensing objects and their spatial relationships. However, it is still difficult to determine the spatial extent of a remote sensing object and the size of a sample patch. If the patch size is too large, it will include too many remote sensing objects and their complex spatial relationships. This will increase the computational burden of the image captioning network and reduce its precision. If the patch size is too small, it often fails to provide enough environmental and contextual information, which makes the remote sensing object difficult to describe. To address this problem, we propose a multi-scale semantic long short-term memory network (MS-LSTM). The remote sensing images are paired into image patches with different spatial scales. First, the large-scale patches have larger sizes. We use a Visual Geometry Group (VGG) network to extract the features from the large-scale patches and input them into the improved MS-LSTM network as the semantic information, which provides a larger receptive field and more contextual semantic information for small-scale image caption so as to play the role of global perspective, thereby enabling the accurate identification of small-scale samples with the same features. Second, a small-scale patch is used to highlight remote sensing objects and simplify their spatial relations. In addition, the multi-receptive field provides perspectives from local to global. The experimental results demonstrated that compared with the original long short-term memory network (LSTM), the MS-LSTM’s Bilingual Evaluation Understudy (BLEU) has been increased by 5.6% to 0.859, thereby reflecting that the MS-LSTM has a more comprehensive receptive field, which provides more abundant semantic information and enhances the remote sensing image captions.

show abstract

Section: Multi-scalementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Multi-Scale Remote Sensing Semantic Analysis Based on a Global Perspective

Cui

Zhang

et al. 2019

IJGI

View full text Add to dashboard Cite

show abstract

“…On this basis, object detection in remote sensing imagery has been widely studied in recent years [27][28][29][30][31][32]. In the field of remote sensing, many researchers have made great efforts to object detection methods based on CNN [33][34][35][36][37][38][39].…”

Section: Introductionmentioning

confidence: 99%

“…In addition, the final detection results are obtained by the way of making a decision fusion on the results of the three sub-networks. Based on the YOLOv2 [23], a single-shot geospatial object detection framework based on multi-scale feature fusion modules has been proposed in [33]. Note that detectors in this model are used in conjunction with multi-scale feature fusion modules.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery

Xie

Qin

et al. 2019

Remote Sensing

View full text Add to dashboard Cite

With great significance in military and civilian applications, the topic of detecting small and densely arranged objects in wide-scale remote sensing imagery is still challenging nowadays. To solve this problem, we propose a novel effectively optimized one-stage network (NEOON). As a fully convolutional network, NEOON consists of four parts: Feature extraction, feature fusion, feature enhancement, and multi-scale detection. To extract effective features, the first part has implemented bottom-up and top-down coherent processing by taking successive down-sampling and up-sampling operations in conjunction with residual modules. The second part consolidates high-level and low-level features by adopting concatenation operations with subsequent convolutional operations to explicitly yield strong feature representation and semantic information. The third part is implemented by constructing a receptive field enhancement (RFE) module and incorporating it into the fore part of the network where the information of small objects exists. The final part is achieved by four detectors with different sensitivities accessing the fused features, all four parallel, to enable the network to make full use of information of objects in different scales. Besides, the Focal Loss is set to enable the cross entropy for classification to solve the tough problem of class imbalance in one-stage methods. In addition, we introduce the Soft-NMS to preserve accurate bounding boxes in the post-processing stage especially for densely arranged objects. Note that the split and merge strategy and multi-scale training strategy are employed in training. Thorough experiments are performed on ACS datasets constructed by us and NWPU VHR-10 datasets to evaluate the performance of NEOON. Specifically, 4.77% and 5.50% improvements in mAP and recall, respectively, on the ACS dataset as compared to YOLOv3 powerfully prove that NEOON can effectually improve the detection accuracy of small objects in remote sensing imagery. In addition, extensive experiments and comprehensive evaluations on the NWPU VHR-10 dataset with 10 classes have illustrated the superiority of NEOON in the extraction of spatial information of high-resolution remote sensing images.

show abstract