2021
DOI: 10.1109/jstars.2021.3109661
|View full text |Cite
|
Sign up to set email alerts
|

A Multiscale Attention Network for Remote Sensing Scene Images Classification

Abstract: The remote sensing scene images classification has been of great value to civil and military fields. Deep learning models, especially the convolutional neural network (CNN), have achieved great success in this task, however, they may suffer from two challenges: firstly, the sizes of the category objects are usually different, but the conventional CNN extracts the features with fixed convolution extractor which could cause the failure in learning the multi-scale features; secondly, some image regions may not be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
26
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 37 publications
(26 citation statements)
references
References 73 publications
(66 reference statements)
0
26
0
Order By: Relevance
“…To comprehensively evaluate the classification performance of the proposed HHTL framework, we compare our method with some state-of-the-art CNN-based and transformer-based methods. The CNN-based methods are GoogLeNet [1], [2], VGGNet-16 [1], [2], VGG-16-CapsNet [71], SCCov [24], VGG-VD16+MSCP+MRA [72], GBNet+global feature [40], MIDC-Net CS [73], EFPN-DSE-TDFF [41], DFAGCN [74], EfficientNet-B0-aux [75], SF-CNN with VGGNet [76], MG-CAP (Sqrt-E) [43], ACNet [52], ACR-MLFF [77], MSA-Network [78]. The transformer-based methods are T2T-ViT-12 [62], Pooling-based Vision Transformer-Small (PiT-S) [79], and Pyramid Vision Transformer-Medium (PVT-Medium) [80].…”
Section: Experimental Results and Comparisonsmentioning
confidence: 99%
See 2 more Smart Citations
“…To comprehensively evaluate the classification performance of the proposed HHTL framework, we compare our method with some state-of-the-art CNN-based and transformer-based methods. The CNN-based methods are GoogLeNet [1], [2], VGGNet-16 [1], [2], VGG-16-CapsNet [71], SCCov [24], VGG-VD16+MSCP+MRA [72], GBNet+global feature [40], MIDC-Net CS [73], EFPN-DSE-TDFF [41], DFAGCN [74], EfficientNet-B0-aux [75], SF-CNN with VGGNet [76], MG-CAP (Sqrt-E) [43], ACNet [52], ACR-MLFF [77], MSA-Network [78]. The transformer-based methods are T2T-ViT-12 [62], Pooling-based Vision Transformer-Small (PiT-S) [79], and Pyramid Vision Transformer-Medium (PVT-Medium) [80].…”
Section: Experimental Results and Comparisonsmentioning
confidence: 99%
“…Like to UCM and AID, we can find that the performance of our HHTL framework is best. Compared with other methods, when 10% scenes are used for training, the enhancements in OA obtained by our HHTL framework are 15.88% (over GoogLeNet), 15.6% (over VGGNet-16), 6.99% (over VGG-16-CapsNet), 2.77% (over SCCov), 4% (over VGG-VD16+MSCP+MRA), 5.95% (over MIDC-Net CS), 0.98% (over ACNet), 2.11% (over EfficientNet-B0-aux), 2.18% (over SF-CNN with VGGNet), 2.06% (over ACR-MLFF), 1.69% (over MSA-Network), 1.24% (over MG- [71] 91.63±0.19 94.74±0.17 SCCov [24] 93.12±0.25 96.10±0.16 VGG-VD16+MSCP+MRA [72] 92.21±0.17 95.56±0.18 GBNet+global feature [40] 92.20±0.23 95.48±0.12 MIDC-Net CS [73] 88.51±0.41 92.95±0.17 EFPN-DSE-TDFF [41] 94.02±0.21 94.50±0.30 ACNet [52] 93.33±0.29 95.38±0.29 DFAGCN [74] -94.88±0.22 EfficientNet-B0-aux [75] 93.69±0.11 96.17±0.16 SF-CNN with VGGNet [76] 93.60±0.12 96.66±0.11 ACR-MLFF [77] 92.73±0.12 95.06±0.33 MSA-Network [78] 93.53±0.21 96.01±0.43 MG-CAP(Sqrt-E) [43] 93.34±0.18 96.12±0.12 HHTL (ours) 95.62±0.13 96.88±0.21…”
Section: Experimental Results and Comparisonsmentioning
confidence: 99%
See 1 more Smart Citation
“…Zhang et al. proposed a multiscale attention network (MSA‐Network) for HSI classification, where the model integrates multiple scales of channels and position attention modules to enhance different scales of features [18]. Gong et al.…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, Tian et al proposed a multiscale dense convolutional network [17], which enhances multi-scale features and incorporates squeeze-andexcitation to process features of various scenes adaptively, solving the problem of inadequate feature extraction in conventional CNNs. Zhang et al proposed a multiscale attention network (MSA-Network) for HSI classification, where the model integrates multiple scales of channels and position attention modules to enhance different scales of features [18]. Gong et al fused the spectral information with multi-scale spatial information by using two-dimensional convolution with different convolutional kernel sizes [19].…”
Section: Introductionmentioning
confidence: 99%