Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.275
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Fusion Techniques for Multimodal Data

Abstract: Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data. In this paper, we propose adaptive fusion techniques that aim to model context from different modalities effectively. Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide "how" to combine a given set of multimodal features more effectively. We propose two networks: 1) Auto-Fusion, which learns to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 23 publications
(25 reference statements)
0
6
0
Order By: Relevance
“…In this study, we leverage Auto-Fusion [12] as our approach for multimodal synthesis, a method proven to enhance the model’s ability to extract intermodal features by optimizing the correlation between the different input modalities. Our Auto-Fusion architecture (as illustrated in Figure 5) consists of two primary components: the input feature fusion module and the reconstruction module.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In this study, we leverage Auto-Fusion [12] as our approach for multimodal synthesis, a method proven to enhance the model’s ability to extract intermodal features by optimizing the correlation between the different input modalities. Our Auto-Fusion architecture (as illustrated in Figure 5) consists of two primary components: the input feature fusion module and the reconstruction module.…”
Section: Methodsmentioning
confidence: 99%
“…These methods are all designed to capture the unique characteristics of each protein modality, while also preserving the symmetries that are inherent to proteins. Additionally, we utilize Auto-Fusion [67] to synthesize a joint representation from these pretrained models, encouraging effective intermodal information extraction. All of our contributions allow us to produce more informative, robust, and unified representation of proteins, which can lead to significant improvements in a variety of protein-related tasks.…”
Section: Introductionmentioning
confidence: 99%
“…Memory-based fusion for multi-view sequential learning models the modalityspecific and cross-modal interactions for multi-view datasets in [128]. Dynamic adaptive fusion scheme where the network decides the optimal way to fuse the modalities dynamically is presented in [129]. Cross-modal fusion by exploiting correlation across modalities by exchanging modality sub-networks is interpretable to a large extent [130].…”
Section: Cmentioning
confidence: 99%
“…However, the resultant architecture either poses a significant computational overhead or further adds to the complexity of a fusion model. We use GAN-Fusion and Auto-Fusion, two adaptive fusion mechanisms that outperform their massive counterparts on challenging multimodal tasks [41], described in more detail in Section 3.…”
Section: Multimodal Deep Learningmentioning
confidence: 99%
“…Since multimodal data is highly heterogeneous, we use two adaptive fusion mechanisms to effectively model inter-and intra-modal dynamics [40,41]. In addition to addressing heterogeneity, these architectures perform impressively on the task of multimodal fusion, despite having significantly fewer number of parameters than the transformers.…”
Section: Fusion Modulesmentioning
confidence: 99%