Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2044
|View full text |Cite
|
Sign up to set email alerts
|

Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile

Abstract: A growing number of human-centered applications benefit from continuous advancements in the emotion recognition technology. Many emotion recognition algorithms have been designed to model multimodal behavior cues to achieve high performances. However, most of them do not consider the modulating factors of an individual's personal attributes in his/her expressive behaviors. In this work, we propose a Personalized Attributes-Aware Attention Network (PAaAN) with a novel personalized attention mechanism to perform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(14 citation statements)
references
References 22 publications
(26 reference statements)
0
14
0
Order By: Relevance
“…Recent studies employ multi-task learning to construct gender-dependent models without inputting speaker attributes [18,27]. Personal profiles have also been utilized to estimate speaker-dependent emotion recognition [28]. In this paper, we do not employ speaker adaptation in order to investigate the influence of just listener dependency; it will be possible, however, to combine the proposed LD model with existing speaker adaptation methods.…”
Section: I R E L a T E D W O R Kmentioning
confidence: 99%
“…Recent studies employ multi-task learning to construct gender-dependent models without inputting speaker attributes [18,27]. Personal profiles have also been utilized to estimate speaker-dependent emotion recognition [28]. In this paper, we do not employ speaker adaptation in order to investigate the influence of just listener dependency; it will be possible, however, to combine the proposed LD model with existing speaker adaptation methods.…”
Section: I R E L a T E D W O R Kmentioning
confidence: 99%
“…Recently, more research efforts focused on auxiliary information and innovative ways to assist emotion recognition. For example, transcripts, language cues and cross-culture information were adopted in emotion recognition [25], [36], [37]. In [38], conditioned data augmentation using generative adversarial networks (GANs) was explored to address the problem of data imbalance in SER tasks.…”
Section: Related Work a Audio-based Emotion Recognitionmentioning
confidence: 99%
“…[24] proposed to bridge the emotional gap by using a hybrid deep model, which first produces audio-visual segment features with convolutional neural networks (CNNs) and 3D-CNN, then fuses them in deep belief networks (DBNs). In [25], a concatenation of different modalities was performed after an encoder which yielded significant improvements. In our recent work [26], we introduced global-trunk based factorized bilinear pooling (G-FBP) to integrate the audio and visual features, achieving a state-of-the-art performance.…”
Section: Introductionmentioning
confidence: 99%
“…Features obtained from each model were fused using a DNN to classify the emotion. Li et al [8] proposed a personalized attribute aware attention mechanism where an attention profile is learned based arXiv:2009.10991v1 [eess.AS] 23 Sep 2020 on acoustic and lexical behavior data. Mirsamadi et al [15] used deep learning along with local attention to automatically extract relevant features where segment level acoustic features are aggregated for utterance level emotion representation.…”
Section: Introductionmentioning
confidence: 99%