Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding

Lech, Margaret; Stolar, Melissa N.; Best, Christopher; Bolia, Robert S.

doi:10.3389/fcomp.2020.00014

Cited by 84 publications

(38 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the required input size for ResNet-18 was 224 x 224 pixels, the original image arrays of 257 x 259 pixels were resized. Consistent with [23,[25][26], the resizing was very small, causing no significant distortion to the spectrogram images and having no effect on the model training results. Each color component of the RGB images was passed as an input to a separate channel of ResNet-18.…”

Section: ) Generation Of Spectrogram Imagesmentioning

confidence: 67%

See 1 more Smart Citation

Prediction of Inter-Personal Trust and Team Familiarity From Speech: A Double Transfer Learning Approach

et al. 2020

Self Cite

View full text Add to dashboard Cite

Section: ) Generation Of Spectrogram Imagesmentioning

confidence: 67%

“…Waveforms remainders that did not fill up to 1second frame length were discarded. The 1-second blockduration of 1 second was consistent with the previously reported duration used in speech-based prediction of speaker's states [24,[25][26]. The stride time between subsequent blocks was chosen in an arbitrary way.…”

Section: ) Splitting Into Blocksmentioning

confidence: 98%

Prediction of Inter-Personal Trust and Team Familiarity From Speech: A Double Transfer Learning Approach

et al. 2020

Self Cite

View full text Add to dashboard Cite

“…Computer Science (Haider et al (2020)). An implementation of real-time voice emotion identification using AlexNet was described in (Lech et al (2020)). When trained on the Berlin Emotional Speech (EMO-DB) database with six emotional classes, the presented method obtained an average accuracy of 82%.…”

Section: Speech Emotion Recognition Using Deep Learning Approachesmentioning

confidence: 99%

Peer Review #2 of "Effect on speech emotion classification of a feature selection approach using a convolutional neural network (v0.2)"

Humayun¹

2021

View full text Add to dashboard Cite

“…With the intensive development and application of artificial intelligence based solutions in human daily life, automatic speech emotion recognition (SER) is gaining ever-increasing attention by the scientific community [1]- [3]. At the same time, there are already commercial solutions utilizing such technology, e.g.…”

Section: Introductionmentioning

confidence: 99%

Language-agnostic speech anger identification

Saitta

Ntalampiras

2021

2021 44th International Conference on Telecommunications and Signal Processing (TSP)

View full text Add to dashboard Cite

Following the constantly increasing adoption of affective computing based solutions, this paper investigates the feasibility of multilingual anger identification. To this end, we formed such a corpus by suitably combining seven different datasets representing five different languages, i.e. English, German, Italian, Urdu, and Persian. After analyzing the diverse characteristics of the datasets, we designed four classification algorithms, namely Support Vector Machine, Decision Treebased Bagging scheme, Convolutional Neural Network, and Convolutional Recurrent Neural Network. Such classification mechanisms are trained on appropriate features extracted from time and/or frequency domains, while speech data have been balanced considering every diverse characteristic incorporated in the datasets (language, sex, acted, etc.). Our findings render multilingual anger identification feasible since the proposed audio pattern recognition methodology based on Mel-spectrograms and CRNN achieved quite satisfactory identification rates.

show abstract

Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding

Cited by 84 publications

References 24 publications

Prediction of Inter-Personal Trust and Team Familiarity From Speech: A Double Transfer Learning Approach

Prediction of Inter-Personal Trust and Team Familiarity From Speech: A Double Transfer Learning Approach

Peer Review #2 of "Effect on speech emotion classification of a feature selection approach using a convolutional neural network (v0.2)"

Language-agnostic speech anger identification

Contact Info

Product

Resources

About