Multiscale Deformable Attention and Multilevel Features Aggregation for Remote Sensing Object Detection

Dong, Xiaohu; Qin, Yao; Fu, Ruigang; Gao, Yinghui; Liu, Songlin; Ye, Yuanxin; Li, Biao

doi:10.1109/lgrs.2022.3178479

Cited by 23 publications

(11 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LAG [ 24 ] proposes a hierarchical anchor generation algorithm that generates anchors in different layers based on the diagonal and aspect ratio of the object, making the anchors in each layer match better with the detection range of that layer. The authors of [ 25 ] proposed a new multi-scale deformable attention module and a multi-level feature aggregation module and inserts them into the feature pyramid network (FPN) to improve the detection performance of various shapes and sizes of remote sensing objects. RSADet [ 26 ] considers the spatial distribution, scale, and orientation changes of the objects in remote sensing images by introducing deformable convolution and a new bounding box confidence prediction branch.…”

Section: Related Workmentioning

confidence: 99%

MSA-YOLO: A Remote Sensing Object Detection Model Based on Multi-Scale Strip Attention

Su,

Yu,

Tan

et al. 2023

Sensors

View full text Add to dashboard Cite

Remote sensing image object detection holds significant research value in resources and the environment. Nevertheless, complex background information and considerable size differences between objects in remote sensing images make it challenging. This paper proposes an efficient remote sensing image object detection model (MSA-YOLO) to improve detection performance. First, we propose a Multi-Scale Strip Convolution Attention Mechanism (MSCAM), which can reduce the introduction of background noise and fuse multi-scale features to enhance the focus of the model on foreground objects of various sizes. Second, we introduce the lightweight convolution module GSConv and propose an improved feature fusion layer, which makes the model more lightweight while improving detection accuracy. Finally, we propose the Wise-Focal CIoU loss function, which can reweight different samples to balance the contribution of different samples to the loss function, thereby improving the regression effect. Experimental results show that on the remote sensing image public datasets DIOR and HRRSD, the performance of our proposed MSA-YOLO model is significantly better than other existing methods.

show abstract

Section: Related Workmentioning

confidence: 99%

MSA-YOLO: A Remote Sensing Object Detection Model Based on Multi-Scale Strip Attention

Su,

Yu,

Tan

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…UAV remote sensing images often contain noise originating from complex scenes, which can interfere with detection outcomes [2]. Additionally, the large feature map resolution of shallow networks tends to produce a lower level of feature abstraction and weaker semantic information, thereby containing more fine-grained details.…”

Section: Rfe and Csamentioning

confidence: 99%

“…In the domain of remote sensing image object detection, multi-scale features enhance the model's detection accuracy [2]. However, some semantic information is lost during the sampling operation performed for feature fusion.…”

Section: Mbus and Mbdsmentioning

confidence: 99%

“…Object detection within UAV remote sensing images holds promise for applications in various sectors including urban planning, land monitoring, precision agriculture, updates to geographic information systems, and military operations, among others [1]. Nevertheless, the complexity of the background and the variability in size and orientation of objects in UAV remote sensing images introduce substantial challenges to object detection in UAV remote sensing images [2].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Branch Parallel Networks for Object Detection in High-Resolution UAV Remote Sensing Images

Zhang

Guo

et al. 2023

Drones

View full text Add to dashboard Cite

Uncrewed Aerial Vehicles (UAVs) are instrumental in advancing the field of remote sensing. Nevertheless, the complexity of the background and the dense distribution of objects both present considerable challenges for object detection in UAV remote sensing images. This paper proposes a Multi-Branch Parallel Network (MBPN) based on the ViTDet (Visual Transformer for Object Detection) model, which aims to improve object detection accuracy in UAV remote sensing images. Initially, the discriminative ability of the input feature map of the Feature Pyramid Network (FPN) is improved by incorporating the Receptive Field Enhancement (RFE) and Convolutional Self-Attention (CSA) modules. Subsequently, to mitigate the loss of semantic information, the sampling process of the FPN is replaced by Multi-Branch Upsampling (MBUS) and Multi-Branch Downsampling (MBDS) modules. Lastly, a Feature-Concatenating Fusion (FCF) module is employed to merge feature maps of varying levels, thereby addressing the issue of semantic misalignment. This paper evaluates the performance of the proposed model on both a custom UAV-captured WCH dataset and the publicly available NWPU VHR10 dataset. The experimental results demonstrate that the proposed model achieves an increase in APL of 2.4% and 0.7% on the WCH and NWPU VHR10 datasets, respectively, compared to the baseline model ViTDet-B.

show abstract

“…In drone ground detection, single-modality data, such as RGB images [6][7][8], infrared images [9][10][11], and other spectral or radar data [12,13], are predominantly used. Multimodal data for target detection has received limited research [14,15].…”

Section: Introductionmentioning

confidence: 99%

MFMG-Net: Multispectral Feature Mutual Guidance Network for Visible–Infrared Object Detection

Zhao,

Lou,

Feng

et al. 2024

Drones

View full text Add to dashboard Cite

Drones equipped with visible and infrared sensors play a vital role in urban road supervision. However, conventional methods using RGB-IR image pairs often struggle to extract effective features. These methods treat these spectra independently, missing the potential benefits of their interaction and complementary information. To address these challenges, we designed the Multispectral Feature Mutual Guidance Network (MFMG-Net). To prevent learning bias between spectra, we have developed a Data Augmentation (DA) technique based on the mask strategy. The MFMG module is embedded between two backbone networks, promoting the exchange of feature information between spectra to enhance extraction. We also designed a Dual-Branch Feature Fusion (DBFF) module based on attention mechanisms, enabling deep feature fusion by emphasizing correlations between the two spectra in both the feature channel and space dimensions. Finally, the fused features feed into the neck network and detection head, yielding ultimate inference results. Our experiments, conducted on the Aerial Imagery (VEDAI) dataset and two other public datasets (M3FD and LLVIP), showcase the superior performance of our method and the effectiveness of MFMG in enhancing multispectral feature extraction for drone ground detection.

show abstract

Multiscale Deformable Attention and Multilevel Features Aggregation for Remote Sensing Object Detection

Cited by 23 publications

References 19 publications

MSA-YOLO: A Remote Sensing Object Detection Model Based on Multi-Scale Strip Attention

MSA-YOLO: A Remote Sensing Object Detection Model Based on Multi-Scale Strip Attention

Multi-Branch Parallel Networks for Object Detection in High-Resolution UAV Remote Sensing Images

MFMG-Net: Multispectral Feature Mutual Guidance Network for Visible–Infrared Object Detection

Contact Info

Product

Resources

About