2022
DOI: 10.1109/jstars.2022.3155665
|View full text |Cite
|
Sign up to set email alerts
|

Homo–Heterogenous Transformer Learning Framework for RS Scene Classification

Abstract: Remote sensing (RS) scene classification plays an essential role in the RS community and has attracted increasing attention due to its wide applications. Recently, benefiting from the powerful feature learning capabilities of convolutional neural networks (CNNs), the accuracy of RS scene classification has significantly been improved. Although the existing CNNbased methods achieve excellent results, there is still room for improvement. First, the CNN-based methods are adept at capturing the global information … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 41 publications
(27 citation statements)
references
References 85 publications
0
19
0
Order By: Relevance
“…Lu et al 62 Explores the content of semantic label Li et al 111 Captures the key region Bi et al 110 Direct control over semantic labels at the bag level Li et al 111 Reduces the dimensionality of bilinear pooled features Wang et al 112 Addresses the problem of large scale variance Xu et al 76 Extracts the context information Xu et al 114 Improves discriminative ability among ambiguous classes Deng et al 63 Extracts semantic features and local structural features Ma et al 75 Captures contextual relationship in complex RSI AB Wang et al 115 Reduces the number of training parameters Li et al 116 Fuses global and local features.…”
Section: Model Reference(s) Remarksmentioning
confidence: 99%
See 1 more Smart Citation
“…Lu et al 62 Explores the content of semantic label Li et al 111 Captures the key region Bi et al 110 Direct control over semantic labels at the bag level Li et al 111 Reduces the dimensionality of bilinear pooled features Wang et al 112 Addresses the problem of large scale variance Xu et al 76 Extracts the context information Xu et al 114 Improves discriminative ability among ambiguous classes Deng et al 63 Extracts semantic features and local structural features Ma et al 75 Captures contextual relationship in complex RSI AB Wang et al 115 Reduces the number of training parameters Li et al 116 Fuses global and local features.…”
Section: Model Reference(s) Remarksmentioning
confidence: 99%
“…Zhao et al 118 Extracts stronger discriminative features Li et al 119 Eliminates irrelevant and redundant information Tang et al 120 Minimize intra class distance and maximize inter class distance Bi et al 121 Local semantic representation capability is enhanced Chen et al 122 Improves feature extraction ability GAN Lin et al 98 Captures powerful discriminative features Yu et al 106 Collects contextual information Wei et al 105 Multi feature fusion layer is incorporated Ma et al 75 Generates labeled samples Miao et al 107 Deals with overfitting problem of deep models SOSFB He et al 61 Preserves second order information.…”
Section: Model Reference(s) Remarksmentioning
confidence: 99%
“…Based on the basic CNN, CTNet 30 delicately develops an enhanced version of the CNNbased network. HHTL 31 carefully designed the patch before it was input to the transformers and subtly fused them after feature extraction. Some methods improved the classification performance by adding operations, e.g., multi-scale, spatial attention, and feature aggregation.…”
Section: Nwpu Datasetmentioning
confidence: 99%
“…Therefore, Li et al 30 proposed the remote sensing transformer (TRS), which uses self-attention integrated into a residual neural network (ResNet), Multi-Head Self-Attention (MHSA) layers instead of spatial convolutions, and concatenates multiple pure transformer module encoders to improve the attention-dependent representation learning performance. Ma et al 31 proposed a homo-heterogenous transformer learning (HHTL) framework for HRRS scene classification according to the characteristics of the transformer to divide the image into multiple patches.…”
mentioning
confidence: 99%
“…From the available transformation change information, they can distinguish the changed region. In recent years, deep learning methods, especially convolutional neural networks (CNNs), have been widely used in various computer vision (CV) tasks, such as image segmentation [11], [12], [13], scene classification [14], [15], [16], object detection [17], [18], saliency detection [19], [20], image enhancement [21], [22], group detection [23], and so on. Because the CNNs can learn the multi-level features and semantic features of the bi-temporal images, they are also introduced into CD to effectively describe the change information [24], [25], [26].…”
Section: Introductionmentioning
confidence: 99%