Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

Chen, Xiaokang; Lin, Kwan-Yee; Wang, Jingbo; Wu, Wayne; Qian, Chen; Li, Hongsheng; Zeng, Gang

doi:10.1007/978-3-030-58621-8_33

Cited by 219 publications

(128 citation statements)

References 38 publications

Supporting

Mentioning

128

Contrasting

Order By: Relevance

“…The Depth images are encoded into HHA [16] images. For a fair comparison, following [10]- [12], [17], we use the DeepLab V3+ [1] as the baseline. All the backbone networks (ResNet-50 [18] and ResNet-101) are pre-trained on ImageNet dataset [19].…”

Section: Resultsmentioning

confidence: 99%

“…ACNet [11] proposed an Attention Complementary Module to extract weighted features from RGB and Depth branches, but it lacks long-range cross-modality dependencies. Chen et al [12] proposed an SA-Gate unit to ensure cross-modality features aggregation via channel-wise attention mechanism, but it lacks non-local spatial cross-modality interaction, which is profoundly important for RGB-D semantic segmentation. CANet [13] proposed to take advantages of long-range crossmodality interdependencies via position and channel attention modules, but it only aggregates the non-local cross-modality features at the final stage of the encoder, which cannot exploit multi-scale non-local cross-modality information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Non-Local Aggregation for RGB-D Semantic Segmentation

Zhang

Xue

Xie

et al. 2021

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Exploiting both RGB (2D appearance) and Depth (3D geometry) information can improve the performance of semantic segmentation. However, due to the inherent difference between the RGB and Depth information, it remains a challenging problem in how to integrate RGB-D features effectively. In this letter, to address this issue, we propose a Nonlocal Aggregation Network (NANet), with a well-designed Multimodality Non-local Aggregation Module (MNAM), to better exploit the non-local context of RGB-D features at multi-stage. Compared with most existing RGB-D semantic segmentation schemes, which only exploit local RGB-D features, the MNAM enables the aggregation of non-local RGB-D information along both spatial and channel dimensions. The proposed NANet achieves comparable performances with state-of-the-art methods on popular RGB-D benchmarks, NYUDv2 and SUN-RGBD.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Non-Local Aggregation for RGB-D Semantic Segmentation

Zhang

Xue

Xie

et al. 2021

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

show abstract

“…We compare our model with 9 state-of-the-art (SOTA) methods, including 3 deep learning based RGB semantic segmentation methods (DUC [26], DANet [6] and HRNet [22]), 3 RGB-T semantic segmentation approaches (MFNet [8], RTFNet [23] and PSTNet [20]) and 3 RGB-D semantic segmentation models (LDFNet [11], ACNet [10] and SA-Gate(ResNet-50) [4]). The procedure of converting the RGB semantic segmentation model into an extended RGB-T model is described as follows.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

“…Recently, with the rapid development of imaging techniques, many studies [4,8,11,20,23,27] employ multimodality data (e.g., RGB-T images and RGB-D images) to address some issues arising from the traditional RGB semantic segmentation. These multi-modality semantic segmentation models are usually divided into two categories, i.e., feature-level fusion based and image-level fusion based ones.…”

Section: Multi-modality Semantic Segmentationmentioning

confidence: 99%

ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation

Zhang

Zhao

Luo

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

“…As shown in Fig. 1(b), the features from two modalities are further fused by various mechanisms such as the element-wise summation [17,23], gate [7,8], and attention [21,44] in the encoder. Such approaches only process the paired complementary cues in the encoder, but ignoring the cross-modal information during decoding.…”

Section: Introductionmentioning

confidence: 99%

Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation

Zhang¹,

Yang²,

Xiong³

et al. 2022

Preprint

View full text Add to dashboard Cite

Encoder-decoder models have been widely used in RGBD semantic segmentation, and most of them are designed via a two-stream network. In general, jointly reasoning the color and geometric information from RGBD is beneficial for semantic segmentation. However, most existing approaches fail to comprehensively utilize multimodal information in both the encoder and decoder. In this paper, we propose a novel attention-based dual supervised decoder for RGBD semantic segmentation. In the encoder, we design a simple yet effective attention-based multimodal fusion module to extract and fuse deeply multi-level paired complementary information. To learn more robust deep representations and rich multi-modal information, we introduce a dual-branch decoder to effectively leverage the correlations and complementary cues of different tasks. Extensive experiments on NYUDv2 and SUN-RGBD datasets demonstrate that our method achieves superior performance against the state-of-the-art methods.

show abstract

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

Cited by 219 publications

References 38 publications

Non-Local Aggregation for RGB-D Semantic Segmentation

Non-Local Aggregation for RGB-D Semantic Segmentation

ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation

Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation

Contact Info

Product

Resources

About