Relation-Attention Networks for Remote Sensing Scene Classification

Wang, Xin; Duan, Lin; Chen, Ning; Zhou, Huiyu

doi:10.1109/jstars.2021.3135566

Cited by 30 publications

(13 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To quantitatively evaluate the performance of our method, we have used several quantitative evaluation metrics including overall accuracy (OA), standard deviation (SD), confusion matrix (CM), and class average accuracy (AA) [27], [62]. OA is a direct measure of the classification accuracy of the model on the entire dataset:…”

Section: Results Comparison and Analysis 1) Evaluation Metricsmentioning

confidence: 99%

Zero-Shot Remote Sensing Scene Classification Method Based on Local-Global Feature Fusion and Weight Mapping Loss

Wang,

Li,

Tanvir

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Zero-shot remote sensing scene classification refers to making the model to have the ability to identify the unseen class scenes based on seen class scenes, and has become a research hotspot in the field of remote sensing. Contemporary approaches in zero-shot remote sensing scene classification primarily focus on extracting global information from scenes, neglecting nuanced local landscape features. This oversight diminishes the discriminative capabilities of recognition models. Furthermore, these methods overlook the semantic relevance between seen and unseen class scenes in training, leading to reduced emphasis on learning from varied scenes and subsequent declines in classification performance. To address these challenges, this paper proposes the "Zero-Shot Remote Sensing Scene Classification Method Based on Local-Global Feature Fusion and Weight Mapping Loss (LGFFWM)." The design incorporates a local-global feature fusion (LGFF) module enabling adaptive labeling and feature modeling of internal local landscapes, effectively merging them with global features for a more discriminative representation of remote sensing scenes. Furthermore, a weight mapping loss (WM Loss) function is introduced, leveraging a semantic correlation matrix to compel the model to prioritize learning seen class scenes that exhibit strong correlations with unseen class scenes by assigning higher training weights. Extensive experiments have been conducted on classical remote sensing scene datasets, including UCM, AID, and NWPU, demonstrate the superiority of the proposed LGFFWM method over ten advanced comparative methods, yielding overall accuracy improvements of over 2.25%, 3.47%, and 0.44%, respectively. Additional experiments on the SIRI-WHU and RSSCN7 datasets underscore the transferability of LGFFWM, achieving overall accuracies of 53.50% and 47.37%, respectively.

show abstract

Section: Results Comparison and Analysis 1) Evaluation Metricsmentioning

confidence: 99%

Zero-Shot Remote Sensing Scene Classification Method Based on Local-Global Feature Fusion and Weight Mapping Loss

Wang,

Li,

Tanvir

et al. 2024

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

show abstract

“…To further show the effect of our MAANet, we compare it with a set of state-of-the-art RSSC algorithms, covering traditional non-DL methods (i.e., BoVW, 7 IFK, 7 LDA, 7 LLC 8 ) that mainly rely on mid-level features and DL-based methods that are closely related to our network. Specifically, these DL models are subdivided into: (1) traditional CNNs (i.e., GoogLeNet, 7 CaffeNet, 7 VGG-VD-16, 7 and VGG-16-CapsNet 15 ); (2) gated networks (i.e., GBNet 18 and GBNet + global feature 18 ); (3) feature pyramid networks (i.e., EFPN-DSE-TDFF 19 and RANet 20 ); (4) global–local feature fusion networks (i.e., LCNN-BFF, 21 HABFNet, 22 MF2Net, 23 and DAFGCN 24 ); (5) attention-based networks (i.e., MS2AP, 25 MSA-Network, 26 SAFF, 27 ResNet50+EAM, 28 ACNet, 29 CSDS, 30 SEMSDNet, 31 ACR-MLFF, 32 CRAN, 33 and TDFE-DAA); 34 and (6) currently popular transformers (i.e., ViT-B_32, 35 T2T-ViT-12, 36 V16_21k, 37 ViT, 35 PVT-V2-B0, 38 PiT-S, 39 Swin-T, 40 PVT-Medium, 41 and T-CNN 42 ). For a fair comparison, all results are obtained by the source codes or provided by the authors directly.…”

Section: Experiences and Resultsmentioning

confidence: 99%

“…In Ref. 20, Wang et al. constructed a relation-attention guided feature pyramid network (RANet) to learn multilevel features.…”

Section: Introductionmentioning

confidence: 99%

Multi-attention aggregation network for remote sensing scene classification

Wang,

Li,

Shi

et al. 2023

J. Appl. Rem. Sens.

Self Cite

View full text Add to dashboard Cite

.Remote sensing (RS) scene classification is a highly challenging task because of the unique characteristics of RS scenes, such as high intra-class variability, large inter-class similarity, and various objects with different scales. Attention, interpreted as an important mechanism of the human visual system, can emphasize meaningful features of deep neural networks, which is beneficial for boosting the classification performance. Motivated by it, we present a multi-attention aggregation network (MAANet), which contains various specially designed attention models, for precise RS scene classification. First, a gated attention fluid coding structure is constructed for mining hierarchical gated attention features from RS images. Second, a progressive pyramid refinement architecture is designed to explore correlations of cross-layer attention features to learn enhanced multi-scale representations. Third, a two-stream attention aggregation structure, equipped with three different attention models, is developed to guide the generation of aggregated features. Finally, a scene label prediction module is proposed for scene label prediction. We conduct extensive experiments on three famous RS scene datasets, and the experimental results show that our MAANet outperforms a number of current representative state-of-the-art approaches for the RS scene classification task.

show abstract

“…In Ref. 29, Wang et al. combined the relation network and attention mechanism to learn powerful feature representations of multiple levels to further improve the classification performance.…”

Section: Related Workmentioning

confidence: 99%

Text guided zero-shot scene classification of high spatial resolution remote sensing images

Liu,

Chen,

Zhou

et al. 2024

J. Appl. Rem. Sens.

View full text Add to dashboard Cite

Recently, high spatial resolution remote sensing image scene classification has had a wide range of applications and has become one of the hotspots in the field of remote sensing research. Due to the complexity of the scenes in remote sensing images, it is impossible to annotate all ground object classes at once. To adapt to different application scenarios, high spatial resolution remote sensing image scene classification models need to have zero-shot generalization ability for unseen classes. To improve the zero-shot generalization ability of classification models, the existing methods often start from the perspective of image features, thus ignoring the high-order semantic information in the scene. In fact, the association between higher-order semantic information in the scene is very important for the generalization ability of the classification model. People often use image information and its corresponding higher-order semantic information to complete remote sensing image scene understanding. Therefore, this work proposes a text guided remote sensing image pre-training model for zero-shot classification of high spatial resolution remote sensing image scenes. First, the transformer model is used to extract the embedded features of text and remote sensing images. Then, based on the aligned text and remote sensing image data, a contrast learning method is used to train the model to learn the correspondence between text and image features. After the model training is completed, the nearest neighbor method is used to complete zero-shot classification on the target data. The effectiveness of the proposed method was verified on three remote sensing image scene classification benchmark datasets.

show abstract

Relation-Attention Networks for Remote Sensing Scene Classification

Cited by 30 publications

References 62 publications

Zero-Shot Remote Sensing Scene Classification Method Based on Local-Global Feature Fusion and Weight Mapping Loss

Zero-Shot Remote Sensing Scene Classification Method Based on Local-Global Feature Fusion and Weight Mapping Loss

Multi-attention aggregation network for remote sensing scene classification

Text guided zero-shot scene classification of high spatial resolution remote sensing images

Contact Info

Product

Resources

About