2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2011
DOI: 10.1109/aspaa.2011.6082328
|View full text |Cite
|
Sign up to set email alerts
|

Learning emotion-based acoustic features with deep belief networks

Abstract: The medium of music has evolved specifically for the expression of emotions, and it is natural for us to organize music in terms of its emotional associations. But while such organization is a natural process for humans, quantifying it empirically proves to be a very difficult task, and as such no dominant feature representation for music emotion recognition has yet emerged. Much of the difficulty in developing emotion-based features is the ambiguity of the ground-truth. Even using the smallest time window, op… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0
3

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 70 publications
(35 citation statements)
references
References 7 publications
0
32
0
3
Order By: Relevance
“…For SER, Stuhlsatz et al [36] used generatively pre-trained artificial neural networks (ANNs) to learn discriminative features of low dimension and found improvement in both weighted and unweighted recall on multiple emotion corpora. Schmidt and Kim [33] used deep belief networks (DBNs) to learn high-level features directly from magnitude spectra and achieved good performances on emotional music recognition compared to other feature extraction schemes. More recently, Le et al [17] investigated dynamic frame-level modeling with hybrid DBN-HMM classifiers on the FAU Aibo spontaneous emotion corpus [35] and achieved state-of-the-art results on the 5-class problem.…”
Section: B Feature Extractionmentioning
confidence: 99%
“…For SER, Stuhlsatz et al [36] used generatively pre-trained artificial neural networks (ANNs) to learn discriminative features of low dimension and found improvement in both weighted and unweighted recall on multiple emotion corpora. Schmidt and Kim [33] used deep belief networks (DBNs) to learn high-level features directly from magnitude spectra and achieved good performances on emotional music recognition compared to other feature extraction schemes. More recently, Le et al [17] investigated dynamic frame-level modeling with hybrid DBN-HMM classifiers on the FAU Aibo spontaneous emotion corpus [35] and achieved state-of-the-art results on the 5-class problem.…”
Section: B Feature Extractionmentioning
confidence: 99%
“…Further examples for emotional speech recognition include [ [7,25,26,1,21,29]]. In a related way, deep learning has also been successfully applied to emotion recognition in music [ [31]]. Further, a number of works combine video cues such as facial information with speech analysis in a deep learning paradigm, such as in [ [24]] or in the winning contribution to the 2013 Emotion in the Wild Challenge held at ACM ICMI [ [23]] which was able to raise the organisers' baseline of 27.5 % accuracy to 41.0 %.…”
Section: Deep Learningmentioning
confidence: 99%
“…This brings a great challenge to describe and determine the goodness of each layer representation. Schmidt et al [15,16] used Deep Belief Networks (DBN) [17] with three hidden layers to learn emotion-based acoustic representations directly from magnitude spectra. Then, they obtained layer two of the DBN that performs best in terms of mean error.…”
Section: Gap Between Acoustic Features and Emotionmentioning
confidence: 99%