2022
DOI: 10.1109/lgrs.2022.3143368
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
55
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 104 publications
(58 citation statements)
references
References 24 publications
0
55
0
Order By: Relevance
“…The structure of the ViT is completely different from the CNN, which treats the 2D image as the 1D ordered sequence and applies the selfattention mechanism for global dependency modelling, demonstrating stronger global feature extraction. Driven by this, many researchers in the field of remote sensing introduced ViTs for segmentation-related tasks, such as land cover classification [63][64][65][66][67][68], urban scene parsing [69][70][71][72][73][74], change detection [75,76], road extraction [77] and especially building extraction [78]. For example, Chen et al [79] proposed a sparse token Transformer to learn the global dependency of tokens in both spatial and channel dimensions, achieving state-of-the-art accuracy on benchmark building extraction datasets.…”
Section: B Vit-based Building Extraction Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The structure of the ViT is completely different from the CNN, which treats the 2D image as the 1D ordered sequence and applies the selfattention mechanism for global dependency modelling, demonstrating stronger global feature extraction. Driven by this, many researchers in the field of remote sensing introduced ViTs for segmentation-related tasks, such as land cover classification [63][64][65][66][67][68], urban scene parsing [69][70][71][72][73][74], change detection [75,76], road extraction [77] and especially building extraction [78]. For example, Chen et al [79] proposed a sparse token Transformer to learn the global dependency of tokens in both spatial and channel dimensions, achieving state-of-the-art accuracy on benchmark building extraction datasets.…”
Section: B Vit-based Building Extraction Methodsmentioning
confidence: 99%
“…the Massachusetts building dataset, WHU building dataset and Inria Aerial Image Labeling dataset. The selected methods include convolutional networks, such as U-Net [21], Deeplabv3+ [88], SRI-Net [16], DS-Net [49], BRRNet [20], SiU-Net [18], CU-Net [19], EU-Net [89], DE-Net [90], MA-FCN [48], MANet [53], MAP-Net [27], Bias-UNet [57], CBRNet [35], and ViT-based networks like SwinUperNet [34], Sparse Token Transformer (STT) [79], MSST-Net [80], BANet [72], DC-Swin [69].…”
Section: B Comparison Of State-of-the-art Methodsmentioning
confidence: 99%
“…Transformer models perform well on natural language processing (NLP) and computer vision (CV) tasks and have attracted considerable attention in remote sensing. Some people apply Transformer to the research of remote sensing works, such as remote sensing image segmentation [38], [39] and remote sensing image change detection [15], [16]. For example, a Transformer-based method has recently been proposed to detect changes in remote sensing images.…”
Section: Related Workmentioning
confidence: 99%
“…In the transformer-based network, self-attention is treated as the main operation in the encoder phase and not only as a single module in the decoder phase. References [25,26] applied a transformer model on remote imagery successfully. However, they only considered color features as inputs.…”
Section: Acquiring Long-range Dependencymentioning
confidence: 99%