Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge 2016
DOI: 10.1145/2988257.2988264
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
60
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 96 publications
(70 citation statements)
references
References 23 publications
3
60
0
Order By: Relevance
“…These trends are exemplified in the annual competitions Emotion Recognition in the Wild (EmotiW) [12] and Audio Video Emotion Challenge (AVEC) [13]. Since 2010, deep learning methods have been applied to affect recognition problems across multiple modalities and led to improvements in accuracy, including winning performances at EmotiW [14], [15], [16] and AVEC [17], [18], [19].…”
Section: Introductionmentioning
confidence: 99%
“…These trends are exemplified in the annual competitions Emotion Recognition in the Wild (EmotiW) [12] and Audio Video Emotion Challenge (AVEC) [13]. Since 2010, deep learning methods have been applied to affect recognition problems across multiple modalities and led to improvements in accuracy, including winning performances at EmotiW [14], [15], [16] and AVEC [17], [18], [19].…”
Section: Introductionmentioning
confidence: 99%
“…One limitation of our current LSTM models is that we do not leverage the ability of neural networks to extract features directly from the raw data. For example, many previous models use a CNN on the raw images to extract visual features (e.g., [36], [37]), rather than calculating visual features separately as we did here. The weights of such a CNN will be modified during training, which "optimizes" the feature extraction process for this particular task.…”
Section: Lstm Resultsmentioning
confidence: 99%
“…Many researchers have since used RNNs and their LSTM variants to recognize emotion from speech and from video. [36], [37], [38] and [25] all used a Convolutional Neural Network to learn hidden layer features from individual video frames, along with a recurrency between hidden layers at consecutive times-thus, combining the time-independent CNN with a RNN. Many others have used LSTMs to recognize emotions from video data.…”
Section: Discriminative Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, the LSTM model has low performance for predicting movie induced emotions [31], 2 yet it has achieved leading performance in various emotion recognition tasks due to its ability to model temporal context (e.g. [32]). Ma et al [31] predict movie induced emotions at an interval of 10 seconds, which already contains temporal context.…”
Section: Previous Work On Liris-accede Databasementioning
confidence: 99%