2019
DOI: 10.3390/app9122470
|View full text |Cite
|
Sign up to set email alerts
|

Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features

Abstract: The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 26 publications
(9 citation statements)
references
References 48 publications
1
6
0
Order By: Relevance
“…Furthermore, ref. [ 13 ] presented a SER based on the key segments to reduce the cost and increase the accuracy utilizing three challenging speech emotion datasets [ 41 ]. However, in this research, we proposed a novel CNN architecture for the SER based on the deep frequency pixel features to recognize the emotional features in the speech spectrograms.…”
Section: Methodsmentioning
confidence: 99%
“…Furthermore, ref. [ 13 ] presented a SER based on the key segments to reduce the cost and increase the accuracy utilizing three challenging speech emotion datasets [ 41 ]. However, in this research, we proposed a novel CNN architecture for the SER based on the deep frequency pixel features to recognize the emotional features in the speech spectrograms.…”
Section: Methodsmentioning
confidence: 99%
“…In studies exploring this question, participants were asked to listen to continuous speech samples and rate how positive or negative they sounded. The emotional valence of a speech signal is a complex combination of several acoustic features including, but not limited to, speech tempo, pitch height and range, and intensity (see Liscombe et al, 2003;Tursunov et al, 2019). While the perceived affect of a speech register can be directly related to speakers' desire to transmit emotion or to their communicative intent, it can also be a by-product of the exaggeration of prosodic and acoustic components intended to enhance a register's clarity or its didactic purpose.…”
Section: Emotional Valence Of Fdsmentioning
confidence: 99%
“…Among the timbral measures suggested by Tursunov et al (2019), spectral descriptors seemed particularly relevant. In addition, timbral descriptors were clearly more helpful to distinguish positive from neutral utterances than the other pairs (positive-negative and negative-neutral).…”
Section: Notes On Acoustic Validation Resultsmentioning
confidence: 99%