Melissa N. Stolar scite author profile

Automatic speech emotion recognition (SER) techniques based on acoustic analysis show high confusion between certain emotional categories. This study used an indirect approach to provide insights into the amplitude-frequency characteristics of different emotions in order to support the development of future, more efficiently differentiating SER methods. The analysis was carried out by transforming short 1-second blocks of speech into RGB or grey-scale images of spectrograms. The images were used to fine-tune a pre-trained image classification network to recognize emotions. Spectrogram representation on four different frequency scales-linear, melodic, equivalent rectangular bandwidth (ERB), and logarithmic-allowed observation of the effects of high, mid-high, mid-low and low frequency characteristics of speech, respectively. Whereas the use of either red (R), green (G) or blue (B) components of RGB images showed the importance of speech components with high, mid and low amplitude levels, respectively. Experiments conducted on the Berlin emotional speech (EMO-DB) data revealed the relative positions of seven emotional categories (anger, boredom, disgust, fear, joy, neutral and sadness) on the amplitudefrequency plane.

show abstract

Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding

Lech

Stolar

Best

et al. 2020

Front. Comput. Sci.

View full text Add to dashboard Cite

Real time speech emotion recognition using RGB image classification and transfer learning

Stolar

Lech

Bolia

et al. 2017

View full text Add to dashboard Cite

Cognitive Load Estimation From Speech Commands to Simulated Aircraft

Vukovic

Stolar

Lech

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Using Deep Learning to Identify Potential Roof Spaces for Solar Panels

House

Lech

Stolar

2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Melissa N. Stolar

Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images

Real-Time Speech Emotion Recognition Using a Pre-trained Image Classification Network: Effects of Bandwidth Reduction and Companding

Real time speech emotion recognition using RGB image classification and transfer learning

Cognitive Load Estimation From Speech Commands to Simulated Aircraft

Using Deep Learning to Identify Potential Roof Spaces for Solar Panels

Contact Info

Product

Resources

About