2019
DOI: 10.1109/access.2019.2955637
|View full text |Cite
|
Sign up to set email alerts
|

A Hybrid Latent Space Data Fusion Method for Multimodal Emotion Recognition

Abstract: Multimodal emotion recognition is an emerging interdisciplinary field of research in the area of affective computing and sentiment analysis. It aims at exploiting the information carried by signals of different nature to make emotion recognition systems more accurate. This is achieved by employing a powerful multimodal fusion method. In this study, a hybrid multimodal data fusion method is proposed in which the audio and visual modalities are fused using a latent space linear map and then, their projected feat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 55 publications
(25 citation statements)
references
References 79 publications
(176 reference statements)
0
17
0
Order By: Relevance
“…Data fusion is a critical step involved in multimodal emotion recognition for producing the estimation. The literature about emotional data fusion involves three data fusion techniques, which are early fusion (feature fusion) [42,43], late fusion (decision fusion) [44,45,46] and hybrid approaches [17,47,48].…”
Section: Background and Literature Review On Multimodal Emotion Rmentioning
confidence: 99%
See 1 more Smart Citation
“…Data fusion is a critical step involved in multimodal emotion recognition for producing the estimation. The literature about emotional data fusion involves three data fusion techniques, which are early fusion (feature fusion) [42,43], late fusion (decision fusion) [44,45,46] and hybrid approaches [17,47,48].…”
Section: Background and Literature Review On Multimodal Emotion Rmentioning
confidence: 99%
“…In [64], a simple hybrid fusion was employed where the output of an early fusion classifier is feeding input to a decision-level fusion system. A recent study in [48] uses a latent space map for the fusion of audio and video modalities; and then, by using a Dempster-Shafer (DS) theory-based evidential fusion method, the projected features on the crossmodal space are fused with the textual modality.…”
Section: Background and Literature Review On Multimodal Emotion Rmentioning
confidence: 99%
“…To address this problem, we exploit an evidential fusion method based on the Dempster-Shafer (D-S) theory. D-S method is one of the most prominent score fusion methods which has been exploited in recent years for polarity detection [52], rating prediction [15], multimodal emotion recognition [53], and project risk assessment [54]. In terms of uncertainty in the validity of the hypotheses, Dempster and Shafer presented a general form of Bayesian theory in which multiple probabilities (e.g., derived from multiple classifiers' outputs) were used to determine the final output on the basis of evidence from uncertain outputs [51].…”
Section: Score Fusion Methods Using An Evidential Approachmentioning
confidence: 99%
“…This method gives different weights based on the correlation and the level of confidence. This method has been recently improved for the same task using a hybrid architecture consisting of latent information obtained through canonical correlation analysis (CCA) [61] and Marginal Fisher Analysis (MFA) [53]. As stated in some of previous studies, the original D-S theory has some limitations [51]; One of the most influential limitations is the production of contradictory results.…”
Section: Score Fusion Methods Using An Evidential Approachmentioning
confidence: 99%
“…On the one hand, with regard to multi-modal data, no matter whether for supervised or semi-supervised learning, we should effectively capture the independent knowledge within each modality [35]. It is worthwhile to stress that independent knowledge from uni-modality is particularly challenging for this task since multi-modal sentiment analysis is performed on spoken language.…”
Section: Introductionmentioning
confidence: 99%