2014
DOI: 10.1017/atsip.2014.11
|View full text |Cite
|
Sign up to set email alerts
|

Survey on audiovisual emotion recognition: databases, features, and data fusion strategies

Abstract: Emotion recognition is the ability to identify what people would think someone is feeling from moment to moment and understand the connection between his/her feelings and expressions. In today's world, human-computer interaction (HCI) interface undoubtedly plays an important role in our daily life. Toward harmonious HCI interface, automated analysis and recognition of human emotion has attracted increasing attention from the researchers in multidisciplinary research fields. In this paper, a survey on the theor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
84
0
1

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 141 publications
(86 citation statements)
references
References 100 publications
1
84
0
1
Order By: Relevance
“…In [6] C. H. wu et al presented a survey on the theoretical and practical work offering new and broad views of the latest research in emotion recognition from bimodal information including facial and vocal expressions is provided.…”
Section: Literature Surveymentioning
confidence: 99%
“…In [6] C. H. wu et al presented a survey on the theoretical and practical work offering new and broad views of the latest research in emotion recognition from bimodal information including facial and vocal expressions is provided.…”
Section: Literature Surveymentioning
confidence: 99%
“…What was of interest to us was the feature fusion. In general, the fusion methods used in multimodal continuous dimensional emotion recognition can be divided into feature level, decision level, model level fusion, and mixed approaches [1] [7].…”
Section: Introductionmentioning
confidence: 99%
“…For feature level fusion, the information from multiple modalities is combined to generate the recognition feature [1] [7]. The simplest method is to construct a joint feature as the input of a regression model by concatenating the features from all modalities [1] [7][11] [16]- [19]. Additionally, many other feature-level fusion strategies have been proposed.…”
Section: Introductionmentioning
confidence: 99%
“…In the multi-modal fusion domain, many approaches attempted to jointly learn temporal features from multiple modalities (Wu et al, 2014a), such as feature-level (early) fusion (Ngiam et al, 2011;Ramanishka et al, 2016), decision-level (late) fusion (He et al, 2015), model-level fusion (Wu et al, 2014b), and attention fusion (Chen Ground Truth: A girl is singing.…”
Section: Introductionmentioning
confidence: 99%