2020
DOI: 10.1007/978-3-030-58621-8_33
|View full text |Cite
|
Sign up to set email alerts
|

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
128
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 219 publications
(128 citation statements)
references
References 38 publications
0
128
0
Order By: Relevance
“…The Depth images are encoded into HHA [16] images. For a fair comparison, following [10]- [12], [17], we use the DeepLab V3+ [1] as the baseline. All the backbone networks (ResNet-50 [18] and ResNet-101) are pre-trained on ImageNet dataset [19].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The Depth images are encoded into HHA [16] images. For a fair comparison, following [10]- [12], [17], we use the DeepLab V3+ [1] as the baseline. All the backbone networks (ResNet-50 [18] and ResNet-101) are pre-trained on ImageNet dataset [19].…”
Section: Resultsmentioning
confidence: 99%
“…ACNet [11] proposed an Attention Complementary Module to extract weighted features from RGB and Depth branches, but it lacks long-range cross-modality dependencies. Chen et al [12] proposed an SA-Gate unit to ensure cross-modality features aggregation via channel-wise attention mechanism, but it lacks non-local spatial cross-modality interaction, which is profoundly important for RGB-D semantic segmentation. CANet [13] proposed to take advantages of long-range crossmodality interdependencies via position and channel attention modules, but it only aggregates the non-local cross-modality features at the final stage of the encoder, which cannot exploit multi-scale non-local cross-modality information.…”
Section: Introductionmentioning
confidence: 99%
“…We compare our model with 9 state-of-the-art (SOTA) methods, including 3 deep learning based RGB semantic segmentation methods (DUC [26], DANet [6] and HRNet [22]), 3 RGB-T semantic segmentation approaches (MFNet [8], RTFNet [23] and PSTNet [20]) and 3 RGB-D semantic segmentation models (LDFNet [11], ACNet [10] and SA-Gate(ResNet-50) [4]). The procedure of converting the RGB semantic segmentation model into an extended RGB-T model is described as follows.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…Recently, with the rapid development of imaging techniques, many studies [4,8,11,20,23,27] employ multimodality data (e.g., RGB-T images and RGB-D images) to address some issues arising from the traditional RGB semantic segmentation. These multi-modality semantic segmentation models are usually divided into two categories, i.e., feature-level fusion based and image-level fusion based ones.…”
Section: Multi-modality Semantic Segmentationmentioning
confidence: 99%
“…As shown in Fig. 1(b), the features from two modalities are further fused by various mechanisms such as the element-wise summation [17,23], gate [7,8], and attention [21,44] in the encoder. Such approaches only process the paired complementary cues in the encoder, but ignoring the cross-modal information during decoding.…”
Section: Introductionmentioning
confidence: 99%