Modulation-based Speech Emotion Recognition with Reconstruction Error Feature Expansion

Mihalache, Serban; Burileanu, Dragoş; Pop, Gheorghe Ioan; Burileanu, Corneliu

doi:10.1109/sped.2019.8906537

Cited by 4 publications

(3 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The former consists of applying amplitude normalization and 7-sample median filtering to each utterance detected by the VAD system, as well as framing the signal using Hamming windows of 25 ms duration with a 15 ms overlap. The feature set used is an extension of the ComParE feature set [ 28 ], and also includes the modulation-based features (MBFs) proposed in [ 34 ] and utilized successfully in our previous work on speech emotion recognition [ 35 ], as well as two utterance-wise prosodic features (UPFs): utterance duration and leading pause duration, i.e., the time interval between the end of the previous utterance and the start of the current one, both shown as relevant for the DSD task [ 19 , 20 ].…”

Section: System Architecturementioning

confidence: 99%

Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection

Mihalache

Burileanu

2022

Sensors

Self Cite

View full text Add to dashboard Cite

In this work, we first propose a deep neural network (DNN) system for the automatic detection of speech in audio signals, otherwise known as voice activity detection (VAD). Several DNN types were investigated, including multilayer perceptrons (MLPs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs), with the best performance being obtained for the latter. Additional postprocessing techniques, i.e., hysteretic thresholding, minimum duration filtering, and bilateral extension, were employed in order to boost performance. The systems were trained and tested using several data subsets of the CENSREC-1-C database, with different simulated ambient noise conditions, and additional testing was performed on a different CENSREC-1-C data subset containing actual ambient noise, as well as on a subset of the TIMIT database. An accuracy of up to 99.13% was obtained for the CENSREC-1-C datasets, and 97.60% for the TIMIT dataset. We proceed to show how the final VAD system can be adapted and employed within an utterance-level deceptive speech detection (DSD) processing pipeline. The best DSD performance is achieved by a novel hybrid CNN-MLP network leveraging a fusion of algorithmically and automatically extracted speech features, and reaches an unweighted accuracy (UA) of 63.7% on the RLDD database, and 62.4% on the RODeCAR database.

show abstract

Section: System Architecturementioning

confidence: 99%

Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection

Mihalache

Burileanu

2022

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although the latter examples are the focus of this work, most SER research focuses on other simpler, more general applied fields, e.g., human-machine interfaces, virtual assistants, affective speech synthesis, etc. [1]. Specifically, this work approaches the SER task in relation to monitoring suspicious behavior for applications such as computer-aided conducting of interviews or questionings carried out by law enforcement organizations, surveillance, criminal or terrorist act prevention, etc.…”

Section: Introductionmentioning

confidence: 99%

“…In our previous work on SER [1], MLP-based systems were used with small input feature sets, tested on a single dataset. The main contributions of the present work include:…”

Section: Introductionmentioning

confidence: 99%

Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques

MIHALACHE,

BURILEANU

2023

ROMJIST

Self Cite

View full text Add to dashboard Cite

Speech emotion recognition (SER) is the task of determining the affective content present in speech, a promising research area of great interest in recent years, with important applications especially in the field of forensic speech and law enforcement operations, among others. In this paper, systems based on deep neural networks (DNNs) spanning five levels of complexity are proposed, developed, and tested, including systems leveraging transfer learning (TL) for the top modern image recognition deep learning models, as well as several ensemble classification techniques that lead to significant performance increases. The systems were tested on the most relevant SER datasets: EMODB, CREMAD, and IEMOCAP, in the context of: (i) classification: using the standard full sets of emotion classes, as well as additional negative emotion subsets relevant for forensic speech applications; and (ii) regression: using the continuously valued 2D arousal-valence affect space. The proposed systems achieved state-of-the-art results for the full class subset for EMODB (up to 83% accuracy) and performance comparable to other published research for the full class subsets for CREMAD and IEMOCAP (up to 55% and 62% accuracy). For the class subsets focusing only on negative affective content, the proposed solutions offered top performance vs. previously published state of the art results.

show abstract

Artificial Intelligence Fights Crime and Terrorism at a New Level

Ionescu

Ghenescu

Răstoceanu³

et al. 2020

IEEE MultiMedia

View full text Add to dashboard Cite

Modulation-based Speech Emotion Recognition with Reconstruction Error Feature Expansion

Cited by 4 publications

References 16 publications

Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection

Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection

Speech Emotion Recognition Using Deep Neural Networks, Transfer Learning, and Ensemble Classification Techniques

Artificial Intelligence Fights Crime and Terrorism at a New Level

Contact Info

Product

Resources

About