2021
DOI: 10.1016/j.imavis.2020.104042
|View full text |Cite
|
Sign up to set email alerts
|

Deep multimodal fusion for semantic image segmentation: A survey

Abstract: Recent advances in deep learning have shown excellent performance in various scene understanding tasks. However, in some complex environments or under challenging conditions, it is necessary to employ multiple modalities that provide complementary information on the same scene. A variety of studies have demonstrated that deep multimodal fusion for semantic image segmentation achieves significant performance improvement. These fusion approaches take the benefits of multiple information sources and generate an o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 124 publications
(42 citation statements)
references
References 119 publications
(145 reference statements)
0
35
0
Order By: Relevance
“…While depictions of early and late fusion styles have been relatively consistent across multiple papers [9,17,37,41,75,84,119], there are still cases where other terms have been used. In [31], one network architecture described as multi-view-one-network is essentially early fusion and one-view-one-network could be considered late fusion.…”
Section: Applying the Taxonomymentioning
confidence: 96%
See 2 more Smart Citations
“…While depictions of early and late fusion styles have been relatively consistent across multiple papers [9,17,37,41,75,84,119], there are still cases where other terms have been used. In [31], one network architecture described as multi-view-one-network is essentially early fusion and one-view-one-network could be considered late fusion.…”
Section: Applying the Taxonomymentioning
confidence: 96%
“…In addition to the presentation of many different network architectures, it was observed that multimodal 2-D models can perform well on a 3-D task, especially since pre-trained 2-D networks were more mature than 3-D networks. [119] also performed a review of research using multimodal image data such as RGB-D for image segmentation.…”
Section: Domain Specific Solutionsmentioning
confidence: 99%
See 1 more Smart Citation
“…The process of combining images from multiple sources into a single imagery is referred to as multi-source image fusion technology, where the resulted fused image would be more beneficial than any of the input imageries, and it has major importance to the photogrammetry tasks in computer vision [26][27][28][29]. A detailed review can be found in [30].…”
Section: Sohn and Dowmanmentioning
confidence: 99%
“…There are many ways of incorporating the parameters into our model [25]. More specifically, we can insert them at the earlier layers, mid-layers, or the last layers of our model.…”
Section: Using Process Parameters As Extra Supervisionmentioning
confidence: 99%