2018
DOI: 10.25046/aj030437
|View full text |Cite
|
Sign up to set email alerts
|

Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images

Abstract: Automatic speech emotion recognition (SER) techniques based on acoustic analysis show high confusion between certain emotional categories. This study used an indirect approach to provide insights into the amplitude-frequency characteristics of different emotions in order to support the development of future, more efficiently differentiating SER methods. The analysis was carried out by transforming short 1-second blocks of speech into RGB or grey-scale images of spectrograms. The images were used to fine-tune a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 24 publications
(28 citation statements)
references
References 34 publications
0
26
0
Order By: Relevance
“…Since, to the best of our knowledge, this work represents the first attempt to use a spectrogram-based approach with crowd emotional sounds, there are no datasets or results in the literature to compare our results. To give a general idea on the algorithm performance, we can provide a coarse-grained comparison with the same approach applied to individual-speech emotional sound [ 25 ], aware that the specific features of the two cases are not strictly comparable. On crowd sounds, the performances are improved on average of 10% .…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…Since, to the best of our knowledge, this work represents the first attempt to use a spectrogram-based approach with crowd emotional sounds, there are no datasets or results in the literature to compare our results. To give a general idea on the algorithm performance, we can provide a coarse-grained comparison with the same approach applied to individual-speech emotional sound [ 25 ], aware that the specific features of the two cases are not strictly comparable. On crowd sounds, the performances are improved on average of 10% .…”
Section: Resultsmentioning
confidence: 99%
“…Such choice can potentially affect the analysis results because different scales emphasize different frequency ranges, thus different components of the sounds in a crowd sound block tend to emerge with different scales. In this work, we systematically analyze four frequency scales, which have been chosen for their intrinsic characteristics, expressing the different contribution in increasingly lower frequency ranges: Mel [ 34 ], for the 4–6 kHz range Erb [ 28 ], for the 2–4 kHz range Bark [ 37 ], for the 0–3.5 kHz range Log [ 25 ], for the 0.02-2 kHz range …”
Section: The System Architecture Workflowmentioning
confidence: 99%
See 3 more Smart Citations