2021
DOI: 10.1007/978-981-16-8531-6_8
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Experiments with more complex feature fusion models have also been carried out. The co-attention and cross-attention techniques proposed in Zhang et al [93] did not improve the results compared to the selected fusion method. Furthermore, we investigate whether using a long short-term memory (LSTM) network in the output of the text model, as suggested by Gallo et al [94], yields improved classification results; however, this did not happen.…”
Section: Multimodal Classificationmentioning
confidence: 81%
“…Experiments with more complex feature fusion models have also been carried out. The co-attention and cross-attention techniques proposed in Zhang et al [93] did not improve the results compared to the selected fusion method. Furthermore, we investigate whether using a long short-term memory (LSTM) network in the output of the text model, as suggested by Gallo et al [94], yields improved classification results; however, this did not happen.…”
Section: Multimodal Classificationmentioning
confidence: 81%
“…The resulting Q contains information from the static modality that is correlated with the time series modality. Unlike the aforementioned strategies, an attention-based mechanism can accurately model correlated parts between modalities [57]. Ideally, this enables the model to learn only the valuable information from the static modality.…”
Section: Attention-based Fusionmentioning
confidence: 99%
“…Late fusion, on the other hand, processes each modality separately and fuses the resulting logits or decision scores. Various techniques, from simple averaging to attention-based methods, are used in existing works [3,22,24,32,37,38]. The fusion strategy significantly impacts the system's robustness and accuracy, especially when modalities provide conflicting cues.…”
Section: Multi-modal Features Fusionmentioning
confidence: 99%