2021
DOI: 10.1109/lsp.2021.3066071
|View full text |Cite
|
Sign up to set email alerts
|

Non-Local Aggregation for RGB-D Semantic Segmentation

Abstract: Exploiting both RGB (2D appearance) and Depth (3D geometry) information can improve the performance of semantic segmentation. However, due to the inherent difference between the RGB and Depth information, it remains a challenging problem in how to integrate RGB-D features effectively. In this letter, to address this issue, we propose a Nonlocal Aggregation Network (NANet), with a well-designed Multimodality Non-local Aggregation Module (MNAM), to better exploit the non-local context of RGB-D features at multi-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 67 publications
(9 citation statements)
references
References 24 publications
0
8
0
Order By: Relevance
“…ResNet-101 52.2 77.4 Malleable 2.5D [25] ResNet-101 50.9 76.9 SA-Gate [17] ResNet-101 52.4 77.9 NANet [31] ResNet-101 52.3 77.9 CEN-PSPNet [24] ResNet-152 52.5 77.7 InverseForm [1] ResNet-101 53. our proposed HS3 training scheme, we are able to consistently improve baseline scores as compared to deep supervision. These improvements come with no added inference cost.…”
Section: Results On Cityscapesmentioning
confidence: 99%
“…ResNet-101 52.2 77.4 Malleable 2.5D [25] ResNet-101 50.9 76.9 SA-Gate [17] ResNet-101 52.4 77.9 NANet [31] ResNet-101 52.3 77.9 CEN-PSPNet [24] ResNet-152 52.5 77.7 InverseForm [1] ResNet-101 53. our proposed HS3 training scheme, we are able to consistently improve baseline scores as compared to deep supervision. These improvements come with no added inference cost.…”
Section: Results On Cityscapesmentioning
confidence: 99%
“…In the second phase, we re-use the weight of pretrianed model to initialize the semantic segmentation network, and train the whole model in an end-to-end manner using Adam optimizer. We employ different initial learning rates of 1×10 −4 , 2×10 Evaluation metrics: We follow previous works [16], [24] to employ PA, MPA, MIoU and FWIoU to evaluate the performance of our static semantic segmentation model. As for dynamic-to-static image translation, we adopt the four popular metrics of L1, L2, PSNR and SSIM to evaluate the quality of the generated static images.…”
Section: A Experimental Settingsmentioning
confidence: 99%
“…In recent years, multi-modality data have been utilized for semantic segmentation to tackle the issue that the RGB segmentation methods are vulnerable to lighting conditions. And numerous RGB-Depth (RGB-D) [11,12,14,18,20,22,23,32,33] and RGB-T [2,13,15,16,17,19,21,34,35] segmentation algorithms have been proposed. Multi-modality feature fusion is the core challenge for multi-modality semantic segmentation.…”
Section: Multi-modality Semantic Segmentationmentioning
confidence: 99%
“…Generally, feature aggregation is independent of feature extraction [2,11,20,13,17]. In particular, several algorithms [14,22,23] attempt to aggregate the multi-modality features interactively. Namely, they feed the aggregated features into the feature extraction block of the next layer, as depicted by dashed arrows in Figure 1 (b).…”
Section: Introductionmentioning
confidence: 99%