2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639126
|View full text |Cite
|
Sign up to set email alerts
|

Multiple windowed spectral features for emotion recognition

Abstract: MFCC (Mel Frequency Cepstral Coefficients) and PLP (Perceptual linear prediction coefficients)or RASTA-PLP have demonstrated good results whether when they are used in combination with prosodic features as suprasegmental (long-term) information or when used stand-alone as segmental (short-time) information. MFCC and PLP feature parameterization aims to represent the speech parameters in a way similar to how sound is perceived by humans. However, MFCC and PLP are usually computed from a Hamming-windowed periodo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
8
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(9 citation statements)
references
References 26 publications
1
8
0
Order By: Relevance
“…With speaker-specific z-normalization, we obtained a UAR of 46.36%. Respectively, these results are significantly better than 44.0% and 44.8% UAR, the current state of the art without [3] and with [4] speaker normalization (one-tailed binomial test, p ≈ 0.002).…”
Section: Introductionmentioning
confidence: 84%
See 1 more Smart Citation
“…With speaker-specific z-normalization, we obtained a UAR of 46.36%. Respectively, these results are significantly better than 44.0% and 44.8% UAR, the current state of the art without [3] and with [4] speaker normalization (one-tailed binomial test, p ≈ 0.002).…”
Section: Introductionmentioning
confidence: 84%
“…Hassan et al [8] achieved a 42.7% UAR by applying importance weights within a SVM to compensate for differences between training and testing conditions. Attabi et al [3] used GMM to model multiple windowed spectrum estimates of Perceptual Linear Prediction (PLP) coefficients, resulting in a 44.0% UAR. The best known result, 44.8% UAR, was achieved with a two-pass system in which a high-level SVM classified each test utterance using ranking scores obtained from five low-level SVMs, one for each emotion [4].…”
Section: Aibo Benchmarkmentioning
confidence: 99%
“…Ia k (m ) k� 2 (15) viii) Spectral Slope SSL(m): It is a measure of voice quality found using linear regression given by Eq. (16), and it represents the amount of decrease of spectral amplitude based on human perception.…”
Section: Spectral Featuresmentioning
confidence: 99%
“…Multi-tapers have been widely used recently for speaker recognition and verification purposes [12][13][14]. In [15], the authors applied multi-tapers for emotion recognition purpose but using only MFCC and perceptual linear prediction (PLP) features. In this paper, various spectral features are used from both conventional and multi-taper spectral estimates to recognize speech emotions.…”
Section: Introductionmentioning
confidence: 99%
“…The multitaper approach have been used in several domains including geophysical applications [11], speaker verification [12], [13] and emotion recognition [14], [15] and it has been shown to improve the performance and robustness of different systems. However, this method has not been used in stress speech recognition applications.…”
Section: Introductionmentioning
confidence: 99%