Deep multimodal fusion for semantic image segmentation: A survey

Zhang, Yifei; Sidibé, Désiré; Morel, Olivier; Mériaudeau, Fabrice

doi:10.1016/j.imavis.2020.104042

Cited by 124 publications

(42 citation statements)

References 119 publications

(145 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While depictions of early and late fusion styles have been relatively consistent across multiple papers [9,17,37,41,75,84,119], there are still cases where other terms have been used. In [31], one network architecture described as multi-view-one-network is essentially early fusion and one-view-one-network could be considered late fusion.…”

Section: Applying the Taxonomymentioning

confidence: 96%

“…In addition to the presentation of many different network architectures, it was observed that multimodal 2-D models can perform well on a 3-D task, especially since pre-trained 2-D networks were more mature than 3-D networks. [119] also performed a review of research using multimodal image data such as RGB-D for image segmentation.…”

Section: Domain Specific Solutionsmentioning

confidence: 99%

“…The term intermediate has been used for cross-modality fusion but can also refer to fusion occurring somewhere between feature extraction and classification [16]. Terms like middle [38], joint [41], and hybrid [119] have also been used to describe this kind of fusion. Since our proposed taxonomy is based on the five processing stages, the middle fusion concept can be captured by early or late fusion style architectures.…”

Section: Applying the Taxonomymentioning

confidence: 99%

See 2 more Smart Citations

Multimodal Classification: Current Landscape, Taxonomy and Future Directions

Sleeman¹,

Kapoor²,

Ghosh³

2021

Preprint

View full text Add to dashboard Cite

Multimodal classification research has been gaining popularity in many domains that collect more data from multiple sources including satellite imagery, biometrics, and medicine. However, the lack of consistent terminology and architectural descriptions makes it difficult to compare different existing solutions. We address these challenges by proposing a new taxonomy for describing such systems based on trends found in recent publications on multimodal classification. Many of the most difficult aspects of unimodal classificationhave not yet been fully addressed for multimodal datasets including big data, class imbalance, and instance level difficulty. We also provide a discussion of these challenges and future directions.

show abstract

Section: Applying the Taxonomymentioning

confidence: 96%

Section: Domain Specific Solutionsmentioning

confidence: 99%

Section: Applying the Taxonomymentioning

confidence: 99%

See 1 more Smart Citation

Multimodal Classification: Current Landscape, Taxonomy and Future Directions

Sleeman¹,

Kapoor²,

Ghosh³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The process of combining images from multiple sources into a single imagery is referred to as multi-source image fusion technology, where the resulted fused image would be more beneficial than any of the input imageries, and it has major importance to the photogrammetry tasks in computer vision [26][27][28][29]. A detailed review can be found in [30].…”

Section: Sohn and Dowmanmentioning

confidence: 99%

A Machine Learning Model for Improving Building Detection in Informal Areas: A Case Study of Greater Cairo

Taha

Ibrahim

2022

GaEE

View full text Add to dashboard Cite

Building detection in Ashwa’iyyat is a fundamental yet challenging problem, mainly because it requires the correct recovery of building footprints from images with high-object density and scene complexity.A classification model was proposed to integrate spectral, height and textural features. It was developed for the automatic detection of the rectangular, irregular structure and quite small size buildings or buildings which are close to each other but not adjoined. It is intended to improve the precision with which buildings are classified using scikit learn Python libraries and QGIS. WorldView-2 and Spot-5 imagery were combined using three image fusion techniques. The Grey-Level Co-occurrence Matrix was applied to determine which attributes are important in detecting and extracting buildings. The Normalized Digital Surface Model was also generated with 0.5-m resolution.The results demonstrated that when textural features of colour images were introduced as classifier input, the overall accuracy was improved in most cases. The results show that the proposed model was more accurate and efficient than the state-of-the-art methods and can be used effectively to extract the boundaries of small size buildings. The use of a classifier ensample is recommended for the extraction of buildings.

show abstract

“…There are many ways of incorporating the parameters into our model [25]. More specifically, we can insert them at the earlier layers, mid-layers, or the last layers of our model.…”

Section: Using Process Parameters As Extra Supervisionmentioning

confidence: 99%

A Multi-Branch U-Net for Steel Surface Defect Type and Severity Segmentation

Neven

Goedemé

2021

Metals

View full text Add to dashboard Cite

Automating sheet steel visual inspection can improve quality and reduce costs during its production. While many manufacturers still rely on manual or traditional inspection methods, deep learning-based approaches have proven their efficiency. In this paper, we go beyond the state-of-the-art in this domain by proposing a multi-task model that performs both pixel-based defect segmentation and severity estimation of the defects in one two-branch network. Additionally, we show how incorporation of the production process parameters improves the model’s performance. After manually constructing a real-life industrial dataset, we first implemented and trained two single-task models performing the defect segmentation and severity estimation tasks separately. Next, we compared this to a multi-task model that simultaneously performs the two tasks at hand. By combining the tasks into one model, both segmentation tasks improved by 2.5% and 3% mIoU, respectively. In the next step, we extended the multi-task model using sensor fusion with process parameters. We demonstrate that the incorporation of the process parameters resulted in a further mIoU increase of 6.8% and 2.9% for the defect segmentation and severity estimation tasks, respectively.

show abstract

Deep multimodal fusion for semantic image segmentation: A survey

Cited by 124 publications

References 119 publications

Multimodal Classification: Current Landscape, Taxonomy and Future Directions

Multimodal Classification: Current Landscape, Taxonomy and Future Directions

A Machine Learning Model for Improving Building Detection in Informal Areas: A Case Study of Greater Cairo

A Multi-Branch U-Net for Steel Surface Defect Type and Severity Segmentation

Contact Info

Product

Resources

About