2023
DOI: 10.3390/s23031743
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Input Speech Emotion Recognition Model Using Mel Spectrogram and GeMAPS

Abstract: The existing research on emotion recognition commonly uses mel spectrogram (MelSpec) and Geneva minimalistic acoustic parameter set (GeMAPS) as acoustic parameters to learn the audio features. MelSpec can represent the time-series variations of each frequency but cannot manage multiple types of audio features. On the other hand, GeMAPS can handle multiple audio features but fails to provide information on their time-series variations. Thus, this study proposes a speech emotion recognition model based on a mult… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 48 publications
0
3
0
Order By: Relevance
“…The accurate and effective extraction of relevant characteristics, as well as the high correlation among these features, are critical elements that significantly affect the effectiveness of the emotion detection system. Contemporary SER approaches have been positively affected by the introduction of several innovative feature extraction methods [ 17 , 18 , 19 , 20 ]. In one study [ 17 ], a deep neural network model for SER that could simultaneously learn both MelSpec and GeMAPS audio features was proposed.…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…The accurate and effective extraction of relevant characteristics, as well as the high correlation among these features, are critical elements that significantly affect the effectiveness of the emotion detection system. Contemporary SER approaches have been positively affected by the introduction of several innovative feature extraction methods [ 17 , 18 , 19 , 20 ]. In one study [ 17 ], a deep neural network model for SER that could simultaneously learn both MelSpec and GeMAPS audio features was proposed.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Contemporary SER approaches have been positively affected by the introduction of several innovative feature extraction methods [ 17 , 18 , 19 , 20 ]. In one study [ 17 ], a deep neural network model for SER that could simultaneously learn both MelSpec and GeMAPS audio features was proposed. The three components of the model are the learning of MelSpec in picture format, learning of GeMAPS in vector format, and combining the two to predict emotions.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Feature extraction is a critical step in classification and recognition using deep learning algorithms. The Mel filter bank, Gammatone filter bank and Bark filter bank are used to extract the spectrograms of sound signals in speech recognition and classification studies based on sound signals [24][25][26][27]. In order to extract richer features, a method for extracting fusion spectrograms is proposed, as shown in Figure 2.…”
Section: Arc Sound Feature Extractionmentioning
confidence: 99%
“…As a result, this study focuses on detecting double-compressed (DC) AMR speech signals. The magnitude of the discrete Fourier transform (DFT) of short speech segments, commonly known as the spectrogram representation of speech signals, has found widespread application in various tasks such as speaker recognition [6], speech recognition [7], emotion recognition [8], and audio event detection [9]. Its effectiveness stems from its ability to capture the spectral content variation of the signal over time, making it suitable for use with deep neural networks (DNNs), such as deep convolutional neural networks (CNNs), and long-short-term memory (LSTM) networks.…”
Section: Introductionmentioning
confidence: 99%