2022
DOI: 10.3390/rs14051294
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Depth Fusion Transformer for Aerial Image Semantic Segmentation

Abstract: Taking depth into consideration has been proven to improve the performance of semantic segmentation through providing additional geometry information. Most existing works adopt a two-stream network, extracting features from color images and depth images separately using two branches of the same structure, which suffer from high memory and computation costs. We find that depth features acquired by simple downsampling can also play a complementary part in the semantic segmentation task, sometimes even better tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(3 citation statements)
references
References 40 publications
0
3
0
Order By: Relevance
“…This basic example shows that after going through the patch merging layer, the height and width of the feature map are halved, but the depth is doubled. To reduce the resolution, adjust the number of channels used, and complete the hierarchical design of the ST, merge correction was used [14]. In this case, each downsample is two, and elements are chosen at every other point in the row and column directions before being spliced together to expand, as seen in Figure 8.…”
Section: Figure 3: the Architecture Of The Swin Transformermentioning
confidence: 99%
“…This basic example shows that after going through the patch merging layer, the height and width of the feature map are halved, but the depth is doubled. To reduce the resolution, adjust the number of channels used, and complete the hierarchical design of the ST, merge correction was used [14]. In this case, each downsample is two, and elements are chosen at every other point in the row and column directions before being spliced together to expand, as seen in Figure 8.…”
Section: Figure 3: the Architecture Of The Swin Transformermentioning
confidence: 99%
“…Some al-ternative approaches to image segmentation employ graph neural networks (GNN) [52][53][54], which operate on graph nodes constructed from an image in a preprocessing step. Another recent line of research in remote sensing image segmentation uses attention-based transformers to replace or supplement the backbone CNN [55][56][57][58].…”
Section: Introductionmentioning
confidence: 99%
“…Many scholars have attempted to improve segmentation accuracy by using deep convolutional networks and feature fusion operations. Yan et al [21] proposed the fusion of color and depth images at the front end of feature extraction to improve segmentation accuracy. Wang et al [22]proposed further improvement of segmentation accuracy by combining the high-level features of two different images obtained by parallel deep networks.…”
Section: Introductionmentioning
confidence: 99%