2020
DOI: 10.3389/fcomp.2020.00014
|View full text |Cite
|
Sign up to set email alerts
|

Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 84 publications
(38 citation statements)
references
References 24 publications
0
25
0
Order By: Relevance
“…Since the required input size for ResNet-18 was 224 x 224 pixels, the original image arrays of 257 x 259 pixels were resized. Consistent with [23,[25][26], the resizing was very small, causing no significant distortion to the spectrogram images and having no effect on the model training results. Each color component of the RGB images was passed as an input to a separate channel of ResNet-18.…”
Section: ) Generation Of Spectrogram Imagesmentioning
confidence: 67%
See 1 more Smart Citation
“…Since the required input size for ResNet-18 was 224 x 224 pixels, the original image arrays of 257 x 259 pixels were resized. Consistent with [23,[25][26], the resizing was very small, causing no significant distortion to the spectrogram images and having no effect on the model training results. Each color component of the RGB images was passed as an input to a separate channel of ResNet-18.…”
Section: ) Generation Of Spectrogram Imagesmentioning
confidence: 67%
“…Waveforms remainders that did not fill up to 1second frame length were discarded. The 1-second blockduration of 1 second was consistent with the previously reported duration used in speech-based prediction of speaker's states [24,[25][26]. The stride time between subsequent blocks was chosen in an arbitrary way.…”
Section: ) Splitting Into Blocksmentioning
confidence: 98%
“…Computer Science (Haider et al (2020)). An implementation of real-time voice emotion identification using AlexNet was described in (Lech et al (2020)). When trained on the Berlin Emotional Speech (EMO-DB) database with six emotional classes, the presented method obtained an average accuracy of 82%.…”
Section: Speech Emotion Recognition Using Deep Learning Approachesmentioning
confidence: 99%
“…With the intensive development and application of artificial intelligence based solutions in human daily life, automatic speech emotion recognition (SER) is gaining ever-increasing attention by the scientific community [1]- [3]. At the same time, there are already commercial solutions utilizing such technology, e.g.…”
Section: Introductionmentioning
confidence: 99%