EMTCAL: Efficient Multiscale Transformer and Cross-Level Attention Learning for Remote Sensing Scene Classification

Tang, Xu; Li, Mingteng; Ma, Jingjing; Zhang, Xiangrong; Liu, Fang; Jiao, Licheng

doi:10.1109/tgrs.2022.3194505

Cited by 52 publications

(32 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, the attention mechanism quantifies how much weight should be put into it. To date, the attention mechanism has been widely used in remote sensing, including scene classification [48], [49], object detection [50], [51], pansharpening [52], [53], and change detection [54], [55], and so forth. The essence of the attention-based deep neural network is to incorporate the similarity between different channels or positions to enhance pixel representations.…”

Section: A Affinity Modellingmentioning

confidence: 99%

A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images

Liu

et al. 2023

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

In remotely sensed images, high intra-class variance and inter-class similarity are ubiquitous due to complex scenes and objects with multivariate features, making semantic segmentation a challenging task. Deep convolutional neural networks can solve this problem by modelling the context of features and improving their discriminability. However, current learning paradigms model the feature affinity in spatial dimension and channel dimension separately and then fuse them in a sequential or parallel manner, leading to suboptimal performance. In this study, we first analyze this problem practically and summarize it as attention bias that reduces the capability of network in distinguishing weak and discretely distributed objects from widerange objects with internal connectivity, when modeled only in spatial or channel domain. To jointly model both spatial and channel affinity, we design a synergistic attention module (SAM), which allows for channel-wise affinity extraction while preserving spatial details. In addition, we propose a synergistic attention perception neural network (SAPNet) for the semantic segmentation of remote sensing images. The hierarchicalembedded synergistic attention perception module aggregates SAM-refined features and decoded features. As a result, SAPNet enriches inference clues with desired spatial and channel details. Experiments on three benchmark datasets show that SAPNet is competitive in accuracy and adaptability compared with stateof-the-art methods. The experiments also validate the hypothesis of attention bias and the efficiency of SAM.

show abstract

Section: A Affinity Modellingmentioning

confidence: 99%

A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images

Liu

et al. 2023

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

“…Various transformers-based studies have been discussed in the literature 29 – 33 For the classification of remote sensing scenes, 29 a unique model of efficient multiscale transformer and cross-level attention learning was proposed. To obtain global visual features and rich contextual information from multiple features, this model used a multilayer feature extraction and contextual information extraction module, respectively.…”

Section: Related Workmentioning

confidence: 99%

Deformable patch-based-multi-layer perceptron Mixer model for forest fire aerial image classification

Mittal

Sharma

Singh

2022

J. Appl. Rem. Sens.

View full text Add to dashboard Cite

Unmanned Aerial Vehicles (UAVs), equipped with mounting camera sensors, facilitate a wide domain of applications deployed in the real-time world. The situational awareness for applications such as search and rescue in case of wildfires, estimation of endangered flora and fauna and emergency responses have seen paradigm shift due to UAVs capability of accessing in remote and challenging areas such as forests. The last decade has seen tremendous growth in CNN based methods for object classification, detection and segmentation tasks. Recently emerged Attention-based Transformer models have been trying to achieve state-of-the art results in predicting images. This paper proposed a novel MLP-Mixer architecture for classification of burned piles in dense forests. MLP mixer architecture tries to eliminate the shortcomings of convolutions and attention by merging them to obtain good performance. The shallow learning of CNN layers and fixed-size patch embedding in transformers have been eliminated by introducing a new module of DePatch in proposed MLP mixer model which divides the input images in a deformable pattern to detect forest fires at an early stage. On the pile images dataset obtained by drones during a burning pile of debris in an Arizona pine forest, our suggested classification algorithm has been tested. The performance of the proposed model has been compared with transformer models.

show abstract

“…Bazi et al [60] applied an attention mechanism to focus on different areas of the image and integrate global information. Tang et al [61] proposed a transformer that used multi-level features to mine the potential context information of remote sensing scenes. However, we believe that in complex landscapes, the transformer model has limitations for feature modeling and high computational complexity.…”

Section: Introductionmentioning

confidence: 99%

Edge Enhanced Channel Attention-Based Graph Convolution Network for Scene Classification of Complex Landscapes

Wang

Zhou

et al. 2023

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

Monitoring the land covers in complex landscapes is of great significance for the sustainable development of mine geoenvironments. As most existing remote sensing scene datasets are composed of RGB images, there is a lack of multimodal datasets for complex landscapes with mining land covers (MLCs) at a finescale. In this study, a new dataset was created by the China University of Geosciences (CUG), Wuhan (named CUG-MLCs) using ZiYuan-3 imagery-based multispectral and topographic data. Moreover, the characteristics of multi-size objects, irregular or blurred edges, and spectral-spatial-topographic heterogeneity and variability limited the classification accuracy. Therefore, an edge enhanced channel attention-based graph convolution network (ECA-GCN) was proposed and tested. The proposed ECA-GCN includes three key modules: (1) Multiscale and shallow feature fusion, used to fuse the multiscale convolutional features and shallow features, which helps present the MLC features with various scales; (2) edge enhanced channel attention, used to further select effective channels after a spatial edge feature enhancement, which helps identify irregular or blurred MLCs; and (3) edge detection-based GCN, used for edge feature-based adjacency matrix and feature maps from (2) to construct GCN, which can obtain edge node relation and global contextual information. This framework improved the representation of complex landscape characteristics. The proposed ECA-GCN achieved an overall accuracy of 66.60% ± 1.39%, averaged accuracy of 36.25% ± 1.50%, and Kappa of 55.91% ± 2.05%, thus, outperforming other models. In general, the proposed dataset and model were positive for the fine classification of complex landscapes.

show abstract

EMTCAL: Efficient Multiscale Transformer and Cross-Level Attention Learning for Remote Sensing Scene Classification

Cited by 52 publications

References 62 publications

A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images

A Synergistical Attention Model for Semantic Segmentation of Remote Sensing Images

Deformable patch-based-multi-layer perceptron Mixer model for forest fire aerial image classification

Edge Enhanced Channel Attention-Based Graph Convolution Network for Scene Classification of Complex Landscapes

Contact Info

Product

Resources

About