2021
DOI: 10.3390/rs13163065
|View full text |Cite
|
Sign up to set email alerts
|

Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

Abstract: Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, urban planning, etc. However, the tremendous details contained in the VFR image, especially the considerable variations in scale and appearance of objects, severely limit the potential of the existing deep learning approaches. Addressing such issues represents a promising research field in the remote sensing community, which … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
20
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
9
1

Relationship

2
8

Authors

Journals

citations
Cited by 88 publications
(20 citation statements)
references
References 61 publications
0
20
0
Order By: Relevance
“…Secondly, the transformer structure also benefits a large number of downstream tasks, e.g., semantic segmentation [ 33 ], remote sensing image classification [ 34 , 35 , 36 ] and behavior analysis [ 37 , 38 , 39 ]. However, in tasks such as semantic segmentation and remote sensing image classification, the contribution of a transformer structure is still limited to its advantage in visual features extraction.…”
Section: Related Workmentioning
confidence: 99%
“…Secondly, the transformer structure also benefits a large number of downstream tasks, e.g., semantic segmentation [ 33 ], remote sensing image classification [ 34 , 35 , 36 ] and behavior analysis [ 37 , 38 , 39 ]. However, in tasks such as semantic segmentation and remote sensing image classification, the contribution of a transformer structure is still limited to its advantage in visual features extraction.…”
Section: Related Workmentioning
confidence: 99%
“…The transformer-yolov5 [47] has also been used in underwater maritime object detection. The transformer has been applied to semantic segmentation for efficient inference and long-range modeling in previous works [48,49]. Different from these, MAT starts from the vanilla transformer block and incorporates the memory mechanism to enhance the representation ability.…”
Section: Vision Transformermentioning
confidence: 99%
“…At present, the most advanced semantic labeling methods of HRRSIs rely on deep neural networks [13]- [17]. These networks are all derived from Fully Convolutional Networks (FCN) [18] which was first proposed by Long et al It uses a fully convolutional layer to replace the fully connected layer in the classification network so that the network maintains its two-dimensional high-level semantic features and then restores the resolution of the image through upsampling to obtain a semantic mask.…”
Section: Introductionmentioning
confidence: 99%