2022
DOI: 10.3390/ijgi11030165
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Resolution Transformer Network for Building and Road Segmentation of Remote Sensing Image

Abstract: Extracting buildings and roads from remote sensing images is very important in the area of land cover monitoring, which is of great help to urban planning. Currently, a deep learning method is used by the majority of building and road extraction algorithms. However, for existing semantic segmentation, it has a limitation on the receptive field of high-resolution remote sensing images, which means that it can not show the long-distance scene well during pixel classification, and the image features is compressed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 43 publications
(24 citation statements)
references
References 30 publications
0
24
0
Order By: Relevance
“…The structure of the ViT is completely different from the CNN, which treats the 2D image as the 1D ordered sequence and applies the selfattention mechanism for global dependency modelling, demonstrating stronger global feature extraction. Driven by this, many researchers in the field of remote sensing introduced ViTs for segmentation-related tasks, such as land cover classification [63][64][65][66][67][68], urban scene parsing [69][70][71][72][73][74], change detection [75,76], road extraction [77] and especially building extraction [78]. For example, Chen et al [79] proposed a sparse token Transformer to learn the global dependency of tokens in both spatial and channel dimensions, achieving state-of-the-art accuracy on benchmark building extraction datasets.…”
Section: B Vit-based Building Extraction Methodsmentioning
confidence: 99%
“…The structure of the ViT is completely different from the CNN, which treats the 2D image as the 1D ordered sequence and applies the selfattention mechanism for global dependency modelling, demonstrating stronger global feature extraction. Driven by this, many researchers in the field of remote sensing introduced ViTs for segmentation-related tasks, such as land cover classification [63][64][65][66][67][68], urban scene parsing [69][70][71][72][73][74], change detection [75,76], road extraction [77] and especially building extraction [78]. For example, Chen et al [79] proposed a sparse token Transformer to learn the global dependency of tokens in both spatial and channel dimensions, achieving state-of-the-art accuracy on benchmark building extraction datasets.…”
Section: B Vit-based Building Extraction Methodsmentioning
confidence: 99%
“…Tao et al [26] designed a spatial information inference structure (SIIS-Net), enabling multidirectional message passing across the rows and columns of feature maps to guess occluded roads. Transformer-based methods [27]- [30] calculate the relationships between all pixels on a feature map using a self-attention module. In a self-attention module, each pixel has a global receptive field, thus supporting a stronger global-contextual structure reasoning ability than the conventional spatial attention modules found in CoANet and SIIS-Net, greatly improving the correctness and the topological completeness in road extraction results.…”
Section: A Road Extractionmentioning
confidence: 99%
“…The scale parameter is a coefficient number that controls the number of the dominant queries in our proposed Dominant-Transformer block. We set the value of the scale parameter to [1,3,5,7,9,10,30,50,90] and tested their performance. The evaluation indicators of different scale values are shown in TABLE V. The third column in the table is the value of U , the number of dominant queries.…”
Section: Effect Of the Scale Parameter Of Dominant Transformermentioning
confidence: 99%
“…Transformers in segmentation. Very recently transformers have started to be used in that segmentation of overhead imagery [17,42], achieving good performance. Transformers had already started to become common in other domains such as TransUnet [7], ViT-V-Net [6], TransClaw U-Net [5], UTNet [13], Cotr [45], and SwinUnet [3] in medical image segmentation.…”
Section: Related Workmentioning
confidence: 99%