A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Bayoudh, Khaled; Knani, Raja; Hamdaoui, Fayçal; Mtibaa, Abdellatif

doi:10.1007/s00371-021-02166-7

Cited by 124 publications

(47 citation statements)

References 219 publications

(232 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the 2D branch, we adopt U-Net [21] with a ResNet34 [10] encoder. For the 3D branch, we use a U-Net (downsampling 6-times) that utilizes sparse convolution [9] on the voxelized point cloud input, where we use either SparseConvNet [8] or MinkowskiNet [5] for our settings 1 . For each setting, all the baseline comparisons are evaluated using the same framework and backbone models.…”

Section: Implementation Detailsmentioning

confidence: 99%

See 1 more Smart Citation

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

Shin¹,

Tsai²,

Zhuang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Test-time adaptation approaches have recently emerged as a practical solution for handling domain shift without access to the source domain data. In this paper, we propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. We find that, directly applying existing methods usually results in performance instability at test time, because multi-modal input is not considered jointly. To design a framework that can take full advantage of multi-modality, where each modality provides regularized self-supervisory signals to other modalities, we propose two complementary modules within and across the modalities. First, Intra-modal Pseudolabel Generation (Intra-PG) is introduced to obtain reliable pseudo labels within each modality by aggregating information from two models that are both pre-trained on source data but updated with target data at different paces. Second, Inter-modal Pseudo-label Refinement (Inter-PR) adaptively selects more reliable pseudo labels from different modalities based on a proposed consistency scheme. Experiments demonstrate that our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios for 3D semantic segmentation. Visit our project website at https://www.neclabs.com/ ˜mas/MM-TTA

show abstract

Section: Implementation Detailsmentioning

confidence: 99%

“…However, multi-modal data is sensitive to a distribution shift at test time when a domain gap exists to the training data [1]. Therefore, it is critical for a model to quickly adapt to the new multi-modal data during testing for obtaining better performance, i.e., through test-time adaptation (TTA) [19,31].…”

Section: Introductionmentioning

confidence: 99%

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

Shin¹,

Tsai²,

Zhuang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Multimodal Deep Learning technology synthesizes information obtained from two or more modes during the analysis process, realizes information complement, and improves precision as well as robustness of prediction results. Previous studies have described various models, algorithms and development trends for Multimodal Learning (Bayoudh et al 2021;Zubatiuk and Isayev 2021). In recent years, researches on curative effect and prognostic analyses have applied multimodal technology for joint feature learning and cross-modal relationship modeling(Cheerla and Gevaert 2019; Hosseini et al 2020;Hügle et al 2021;Yao et al 2020b).…”

Section: Multimodal Learning (Mml)mentioning

confidence: 99%

Computer-Aided Decision-Making System for Endometrial Atypical Hyperplasia based on Multimodal & Multi-Instance Deep Convolution Networks

Liao

Zheng

et al. 2021

Preprint

View full text Add to dashboard Cite

The pathological diagnosis is the gold standard for neoplasms and their precursors, which is highly relevant to the treatment planning and the prognostic analysis. Currently, deep learning networks have been used for the pathological computer-assisted diagnosis and treatment decision-makings. However, due to extremely large size of the whole slide images (WSIs) of pathological slides, the prevailing deep learning models are un-applicable directly in the WSIs analysis. Moreover, the precise exclusion of the blank regions and interfere regions, as well as the manual annotation of various lesioned and normal regions in super large WSIs are infeasible and unavailable in clinical practice. To address aforementioned problems, we develop an computer-aided decision-making system based on multimodal and multi-instance deep convolution networks (CNN) to assist in the diagnosis and treatment of endometrial atypical hyperplasia (AH)/ endometrial intraepithelial hyperplasia (EIH). Firstly, we set up the frame-work of computer-aided decision-making system based on the WSIs image patterns of AH/EIH, and then transfer the large-scale WSI analysis to the small-scale analysis of multiple suspected lesion regions which can be accomplished the major computer vision models, and eventually the results of prognostic analysis for multiple small-scale suspected lesion regions are summarized to obtain the prognostic results of WSIs by the decision supporting algorithm based on the cognition intelligence. We validate the method via experimental analysis of 102 endometrial atypical hyperplasia patients at the West China Second University Hospital of Sichuan University. The performance achieved for endometrial AH/EIH prognostic analysis includes accuracy (85.3%), precision (84.6%), recall (86.3%). Meanwhile, the method has superior performance to prognostic judgment of a single pathologist as well as approximates to analysis results determined by three pathologists according to the majority voting method.

show abstract

“…preprocessing 40,41 , clustering 42,43 , cell-type identification 44,45 and data augmentation 46,47 ), and have shown to significantly improve upon traditional methods 10 , suggesting the potential of such methods in ST analysis. Moreover, DL models can leverage multiple data sources, such as images and text data, to learn a set of tasks 48 . Given that spatially-resolved transcriptomics are inherently multimodal (i.e.…”

Section: Introductionmentioning

confidence: 99%

Deep Learning in Spatial Transcriptomics: Learning From the Next Next-Generation Sequencing

Heydari

Sindi

2022

Preprint

View full text Add to dashboard Cite

Spatial transcriptomics (ST) technologies are rapidly becoming the extension of single-cell RNA sequencing (scRNAseq), holding the potential of profiling gene expression at a single-cell resolution while maintaining cellular compositions within a tissue. Having both expression profiles and tissue organization enables researchers to better understand cellular interactions and heterogeneity, providing insight into complex biological processes that would not be possible with traditional sequencing technologies. The data generated by ST technologies are inherently noisy, high-dimensional, sparse, and multi-modal (including histological images, count matrices, etc.), thus requiring specialized computational tools for accurate and robust analysis. However, many ST studies currently utilize traditional scRNAseq tools, which are inadequate for analyzing complex ST datasets. On the other hand, many of the existing ST-specific methods are built upon traditional statistical or machine learning frameworks, which have shown to be sub-optimal in many applications due to the scale, multi-modality, and limitations of spatially-resolved data (such as spatial resolution, sensitivity and gene coverage). Given these intricacies, researchers have developed deep learning (DL)-based models to alleviate ST-specific challenges. These methods include new state-of-the-art models in alignment, spatial reconstruction, and spatial clustering among others. However, deep-learning models for ST analysis are nascent and remain largely underexplored. In this review, we provide an overview of existing state-of-the-art tools for analyzing spatially-resolved transcriptomics, while delving deeper into the DL-based approaches. We discuss the new frontiers and the open questions in this field and highlight the domains in which we anticipate transformational DL applications.

show abstract

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Cited by 124 publications

References 219 publications

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

Computer-Aided Decision-Making System for Endometrial Atypical Hyperplasia based on Multimodal & Multi-Instance Deep Convolution Networks

Deep Learning in Spatial Transcriptomics: Learning From the Next Next-Generation Sequencing

Contact Info

Product

Resources

About

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Cited by 124 publications

References 219 publications

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation

Computer-Aided Decision-Making System for Endometrial Atypical Hyperplasia based on Multimodal &amp; Multi-Instance Deep Convolution Networks

Deep Learning in Spatial Transcriptomics: Learning From the Next Next-Generation Sequencing

Contact Info

Product

Resources

About

Computer-Aided Decision-Making System for Endometrial Atypical Hyperplasia based on Multimodal & Multi-Instance Deep Convolution Networks