Multilevel Feature Fusion Networks With Adaptive Channel Dimensionality Reduction for Remote Sensing Scene Classification

Wang, Xin; Duan, Lin; Shi, Aiye; Zhou, Huiyu

doi:10.1109/lgrs.2021.3070016

Cited by 34 publications

(32 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To comprehensively evaluate the classification performance of the proposed HHTL framework, we compare our method with some state-of-the-art CNN-based and transformer-based methods. The CNN-based methods are GoogLeNet [1], [2], VGGNet-16 [1], [2], VGG-16-CapsNet [71], SCCov [24], VGG-VD16+MSCP+MRA [72], GBNet+global feature [40], MIDC-Net CS [73], EFPN-DSE-TDFF [41], DFAGCN [74], EfficientNet-B0-aux [75], SF-CNN with VGGNet [76], MG-CAP (Sqrt-E) [43], ACNet [52], ACR-MLFF [77], MSA-Network [78]. The transformer-based methods are T2T-ViT-12 [62], Pooling-based Vision Transformer-Small (PiT-S) [79], and Pyramid Vision Transformer-Medium (PVT-Medium) [80].…”

Section: Experimental Results and Comparisonsmentioning

confidence: 99%

“…Like to UCM and AID, we can find that the performance of our HHTL framework is best. Compared with other methods, when 10% scenes are used for training, the enhancements in OA obtained by our HHTL framework are 15.88% (over GoogLeNet), 15.6% (over VGGNet-16), 6.99% (over VGG-16-CapsNet), 2.77% (over SCCov), 4% (over VGG-VD16+MSCP+MRA), 5.95% (over MIDC-Net CS), 0.98% (over ACNet), 2.11% (over EfficientNet-B0-aux), 2.18% (over SF-CNN with VGGNet), 2.06% (over ACR-MLFF), 1.69% (over MSA-Network), 1.24% (over MG- [71] 91.63±0.19 94.74±0.17 SCCov [24] 93.12±0.25 96.10±0.16 VGG-VD16+MSCP+MRA [72] 92.21±0.17 95.56±0.18 GBNet+global feature [40] 92.20±0.23 95.48±0.12 MIDC-Net CS [73] 88.51±0.41 92.95±0.17 EFPN-DSE-TDFF [41] 94.02±0.21 94.50±0.30 ACNet [52] 93.33±0.29 95.38±0.29 DFAGCN [74] -94.88±0.22 EfficientNet-B0-aux [75] 93.69±0.11 96.17±0.16 SF-CNN with VGGNet [76] 93.60±0.12 96.66±0.11 ACR-MLFF [77] 92.73±0.12 95.06±0.33 MSA-Network [78] 93.53±0.21 96.01±0.43 MG-CAP(Sqrt-E) [43] 93.34±0.18 96.12±0.12 HHTL (ours) 95.62±0.13 96.88±0.21…”

Section: Experimental Results and Comparisonsmentioning

confidence: 99%

“…How to enhance HHTL's behavior in these content-similar categories is our future work. [71] 85.08±0.13 89.18±0.14 SCCov [24] 89.30±0.35 92.10±0.25 VGG-VD16+MSCP+MRA [72] 88.07±0.18 90.81±0.13 MIDC-Net CS [73] 86.12±0.29 87.99±0.18 ACNet [52] 91.09±0.13 92.42±0.16 DFAGCN [74] -89.29±0.28 EfficientNet-B0-aux [75] 89.96±0.27 92.89±0.16 SF-CNN with VGGNet [76] 89.89±0.16 92.55±0.14 ACR-MLFF [77] 90.01±0.33 92.45±0.20 MSA-Network [78] 90.38±0.17 93.52±0.21 MG-CAP(Sqrt-E) [43] 90.83±0.12 92.95±0.13 HHTL (ours) 92.07±0.44 94.21±0.09…”

Section: Experimental Results and Comparisonsmentioning

confidence: 99%

See 2 more Smart Citations

Homo–Heterogenous Transformer Learning Framework for RS Scene Classification

Tang

et al. 2022

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Remote sensing (RS) scene classification plays an essential role in the RS community and has attracted increasing attention due to its wide applications. Recently, benefiting from the powerful feature learning capabilities of convolutional neural networks (CNNs), the accuracy of RS scene classification has significantly been improved. Although the existing CNNbased methods achieve excellent results, there is still room for improvement. First, the CNN-based methods are adept at capturing the global information from RS scenes. Still, the context relationships hidden in RS scenes cannot be thoroughly mined. Second, due to the specific structure, it is easy for normal CNNs to exploit the heterogenous information from RS scenes. Nevertheless, the homogenous information, which is also crucial to comprehensively understand complex contents within RS scenes, does not get the attention it deserves. Third, most CNNs focus on establishing the relationships between RS scenes and semantic labels. However, the similarities between them are not considered deeply, which are helpful to distinguish the intra-/inter-class samples. To overcome the limitations mentioned above, we propose a homo-heterogenous transformer learning (HHTL) framework for RS scene classification in this paper. First, a patch generation module (PGM) is designed to generate homogenous and heterogenous patches. Then, a dual-branch feature learning module (FLM) is proposed to mine homogenous and heterogenous information within RS scenes simultaneously. In FLM, based on vision transformer, not only the global information but also the local areas and their context information can be captured. Finally, we design a classification module, which consists of a fusion sub-module and a metric-learning module. It can integrate homo-heterogenous information and compact/separate samples from the same/different RS scene categories. Extensive experiments are conducted on four public RS scene data sets. The encouraging results demonstrate that our HHTL framework can outperform many state-of-the-art methods. Our source codes are available at https://github.com/TangXu-Group/Remote-Sensing-Images-Classification/tree/main/HHTL.

show abstract

Section: Experimental Results and Comparisonsmentioning

confidence: 99%

Section: Experimental Results and Comparisonsmentioning

confidence: 99%

Section: Experimental Results and Comparisonsmentioning

confidence: 99%

See 1 more Smart Citation

Homo–Heterogenous Transformer Learning Framework for RS Scene Classification

Tang

et al. 2022

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

show abstract

“…In view of the strong feature description capabilities of CNNs, they have been broadly applied to many computer vision tasks, involving remote sensing scene classification [5]- [9]. At present, a lot of CNN methods use the features extracted from the top layers for RS image representation, for the top layers give more semantically meaningful representations that are suitable for extracting global visual scene context.…”

Section: A Cnn Modelsmentioning

confidence: 99%

“…Recently, with the rapid development of deep learning (DL), convolutional neural networks (CNNs) have demonstrated competitive performances for many computer vision tasks, including RS scene classification [5]- [8]. In CNNs, the lowerlevel features from shallow layers reflect the details of images, while the higher-level ones from deep layers contain rich semantic information and thus are more discriminative, abstract, and robust [9].…”

Section: Introductionmentioning

confidence: 99%

Relation-Attention Networks for Remote Sensing Scene Classification

Wang

Duan

Chen

et al. 2022

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

Self Cite

View full text Add to dashboard Cite

Remote sensing (RS) scene classification plays an important role in a wide range of RS applications. Recently, convolutional neural networks (CNNs) have been applied to the field of scene classification in RS images and achieved impressive performance. However, to classify RS scenes, most of the existing CNN methods either utilize the high-level features from the last convolutional layer of CNNs, missing much important information existing at the other levels, or directly fuse the features at different levels, bringing redundant and/or mutually exclusive information. Inspired by the attention mechanism of the human visual system, in this paper, we explore a novel relation-attention model and design an end-to-end relationattention network (RANet) to learn powerful feature representations of multiple levels to further improve the classification performance. First, we propose to extract convolutional features at different levels by pre-trained CNNs. Second, a multi-scale feature computation module is constructed to connect features at different levels and generate multi-scale semantic features. Third, a novel relation-attention model is designed to focus on the critical features whilst avoiding the use of redundant and even distractive ones by exploiting the scale contextual information. Finally, the resulting relation-attention features are concatenated and fed into a softmax layer for the final classification. Experiments on four well-known RS scene classification data sets (UC-Merced, WHU-RS19, AID, and OPTIMAL-31) show that our method outperforms some state-ofthe-art algorithms. The code of our proposed method is publicly

show abstract

Enhanced Feature Fusion from Dual Attention Paths Using Feature Gating Mechanism for Scene Categorization of Aerial Images

Akila,

Gayathri

2023

Fourth International Conference on Image Processing and Capsule Networks

View full text Add to dashboard Cite

Multilevel Feature Fusion Networks With Adaptive Channel Dimensionality Reduction for Remote Sensing Scene Classification

Cited by 34 publications

References 24 publications

Homo–Heterogenous Transformer Learning Framework for RS Scene Classification

Homo–Heterogenous Transformer Learning Framework for RS Scene Classification

Relation-Attention Networks for Remote Sensing Scene Classification

Enhanced Feature Fusion from Dual Attention Paths Using Feature Gating Mechanism for Scene Categorization of Aerial Images

Contact Info

Product

Resources

About