2023
DOI: 10.3390/electronics12040839
|View full text |Cite
|
Sign up to set email alerts
|

Speech Emotion Recognition Based on Multiple Acoustic Features and Deep Convolutional Neural Network

Abstract: Speech emotion recognition (SER) plays a vital role in human–machine interaction. A large number of SER schemes have been anticipated over the last decade. However, the performance of the SER systems is challenging due to the high complexity of the systems, poor feature distinctiveness, and noise. This paper presents the acoustic feature set based on Mel frequency cepstral coefficients (MFCC), linear prediction cepstral coefficients (LPCC), wavelet packet transform (WPT), zero crossing rate (ZCR), spectrum cen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
19
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(20 citation statements)
references
References 41 publications
1
19
0
Order By: Relevance
“…2 constitute the time domain feature, which shows the impact of sentiment on the temporal properties of speech. The voice quality features jitter, root mean square (RMS), and shimmer value, which provide variations in voice quality due to emotions in terms of amplitude and time [7]. The details of the features are described in Table 1.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…2 constitute the time domain feature, which shows the impact of sentiment on the temporal properties of speech. The voice quality features jitter, root mean square (RMS), and shimmer value, which provide variations in voice quality due to emotions in terms of amplitude and time [7]. The details of the features are described in Table 1.…”
Section: Methodsmentioning
confidence: 99%
“…Instead of relying on conventional input devices, this system uses voice commands to communicate naturally between the user and the computer [1][2][3]. There are a wide variety of applications for SSR, including but not limited to interactive robotics, contact centers, onboard vehicle driving systems, interactive game development, online learning, medical-psychological analysis, and tutoring systems [4][5][6][7].…”
Section: Introductionmentioning
confidence: 99%
“…The authors of [ 41 ] proposed a methodology for SER that leverages MFCC and a one-dimensional convolutional neural network with the aim of diminishing computational complexity. The approach involves the use of various acoustic properties to present collaborative low-order and high-order features and the development of a lightweight one-dimensional deep convolutional neural network to streamline the deep learning frameworks for SER.…”
Section: Methodsmentioning
confidence: 99%
“…The RMS value can be calculated for short-time windows of the speech signal, typically in the range of 20-50 milliseconds. The RMS values for these short-time windows can be used to characterize the changes in loudness or energy over time, which can be indicative of changes in emotional content [43].…”
Section: Root Mean Squarementioning
confidence: 99%