2017
DOI: 10.1007/978-3-319-50115-4_41
|View full text |Cite
|
Sign up to set email alerts
|

Deep Multispectral Semantic Scene Understanding of Forested Environments Using Multimodal Fusion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
109
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 125 publications
(110 citation statements)
references
References 8 publications
0
109
0
1
Order By: Relevance
“…Finally, we present extensive experimental evaluations of our proposed unimodal and multimodal architectures on benchmark scene understanding datasets including Cityscapes (Cordts et al, 2016), Synthia (Ros et al, 2016), SUN RGB-D (Song et al, 2015), ScanNet (Dai et al, 2017) and Freiburg Forest (Valada et al, 2016b). The results demonstrate that our model sets the new state-of-the-art on all these benchmarks considering the computational efficiency and the fast inference time of 72ms on a consumer grade GPU.…”
Section: Introductionmentioning
confidence: 91%
See 3 more Smart Citations
“…Finally, we present extensive experimental evaluations of our proposed unimodal and multimodal architectures on benchmark scene understanding datasets including Cityscapes (Cordts et al, 2016), Synthia (Ros et al, 2016), SUN RGB-D (Song et al, 2015), ScanNet (Dai et al, 2017) and Freiburg Forest (Valada et al, 2016b). The results demonstrate that our model sets the new state-of-the-art on all these benchmarks considering the computational efficiency and the fast inference time of 72ms on a consumer grade GPU.…”
Section: Introductionmentioning
confidence: 91%
“…In the late fusion approach, identical network streams are first trained individually on a specific modality and the feature maps are fused towards the end of network using concatenation (Eitel et al, 2015) or element-wise summation (Valada et al, 2016b), followed by learning deeper fused representations. However, this does not enable the network to adapt the fusion to changing scene context.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…According to several researches [27,17], methods based on multiple encoders have better capability to capture complementary and cross-modal interdependent features. Therefore, our proposed framework is based on multi-encoderbased method.…”
Section: Multi-modal Fusionmentioning
confidence: 99%