2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) 2019
DOI: 10.1109/sped.2019.8906537
|View full text |Cite
|
Sign up to set email alerts
|

Modulation-based Speech Emotion Recognition with Reconstruction Error Feature Expansion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
1
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 16 publications
0
1
0
Order By: Relevance
“…The former consists of applying amplitude normalization and 7-sample median filtering to each utterance detected by the VAD system, as well as framing the signal using Hamming windows of 25 ms duration with a 15 ms overlap. The feature set used is an extension of the ComParE feature set [ 28 ], and also includes the modulation-based features (MBFs) proposed in [ 34 ] and utilized successfully in our previous work on speech emotion recognition [ 35 ], as well as two utterance-wise prosodic features (UPFs): utterance duration and leading pause duration, i.e., the time interval between the end of the previous utterance and the start of the current one, both shown as relevant for the DSD task [ 19 , 20 ].…”
Section: System Architecturementioning
confidence: 99%
“…The former consists of applying amplitude normalization and 7-sample median filtering to each utterance detected by the VAD system, as well as framing the signal using Hamming windows of 25 ms duration with a 15 ms overlap. The feature set used is an extension of the ComParE feature set [ 28 ], and also includes the modulation-based features (MBFs) proposed in [ 34 ] and utilized successfully in our previous work on speech emotion recognition [ 35 ], as well as two utterance-wise prosodic features (UPFs): utterance duration and leading pause duration, i.e., the time interval between the end of the previous utterance and the start of the current one, both shown as relevant for the DSD task [ 19 , 20 ].…”
Section: System Architecturementioning
confidence: 99%
“…Although the latter examples are the focus of this work, most SER research focuses on other simpler, more general applied fields, e.g., human-machine interfaces, virtual assistants, affective speech synthesis, etc. [1]. Specifically, this work approaches the SER task in relation to monitoring suspicious behavior for applications such as computer-aided conducting of interviews or questionings carried out by law enforcement organizations, surveillance, criminal or terrorist act prevention, etc.…”
Section: Introductionmentioning
confidence: 99%
“…In our previous work on SER [1], MLP-based systems were used with small input feature sets, tested on a single dataset. The main contributions of the present work include:…”
Section: Introductionmentioning
confidence: 99%