2021
DOI: 10.1007/s00371-021-02166-7
|View full text |Cite
|
Sign up to set email alerts
|

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Abstract: The research progress in multimodal learning has grown rapidly over the last decade in several areas, especially in computer vision. The growing potential of multimodal data streams and deep learning algorithms has contributed to the increasing universality of deep multimodal learning. This involves the development of models capable of processing and analyzing the multimodal information uniformly. Unstructured real-world data can inherently take many forms, also known as modalities, often including visual and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
47
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 124 publications
(47 citation statements)
references
References 219 publications
(232 reference statements)
0
47
0
Order By: Relevance
“…For the 2D branch, we adopt U-Net [21] with a ResNet34 [10] encoder. For the 3D branch, we use a U-Net (downsampling 6-times) that utilizes sparse convolution [9] on the voxelized point cloud input, where we use either SparseConvNet [8] or MinkowskiNet [5] for our settings 1 . For each setting, all the baseline comparisons are evaluated using the same framework and backbone models.…”
Section: Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the 2D branch, we adopt U-Net [21] with a ResNet34 [10] encoder. For the 3D branch, we use a U-Net (downsampling 6-times) that utilizes sparse convolution [9] on the voxelized point cloud input, where we use either SparseConvNet [8] or MinkowskiNet [5] for our settings 1 . For each setting, all the baseline comparisons are evaluated using the same framework and backbone models.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…However, multi-modal data is sensitive to a distribution shift at test time when a domain gap exists to the training data [1]. Therefore, it is critical for a model to quickly adapt to the new multi-modal data during testing for obtaining better performance, i.e., through test-time adaptation (TTA) [19,31].…”
Section: Introductionmentioning
confidence: 99%
“…Multimodal Deep Learning technology synthesizes information obtained from two or more modes during the analysis process, realizes information complement, and improves precision as well as robustness of prediction results. Previous studies have described various models, algorithms and development trends for Multimodal Learning (Bayoudh et al 2021;Zubatiuk and Isayev 2021). In recent years, researches on curative effect and prognostic analyses have applied multimodal technology for joint feature learning and cross-modal relationship modeling(Cheerla and Gevaert 2019; Hosseini et al 2020;Hügle et al 2021;Yao et al 2020b).…”
Section: Multimodal Learning (Mml)mentioning
confidence: 99%
“…preprocessing 40,41 , clustering 42,43 , cell-type identification 44,45 and data augmentation 46,47 ), and have shown to significantly improve upon traditional methods 10 , suggesting the potential of such methods in ST analysis. Moreover, DL models can leverage multiple data sources, such as images and text data, to learn a set of tasks 48 . Given that spatially-resolved transcriptomics are inherently multimodal (i.e.…”
Section: Introductionmentioning
confidence: 99%