2018 International Symposium ELMAR 2018
DOI: 10.23919/elmar.2018.8534630
|View full text |Cite
|
Sign up to set email alerts
|

Time Window Analysis for Automatic Speech Emotion Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 8 publications
0
6
0
Order By: Relevance
“…After applying a rolling window on the mel spectrogram, a sequence of 6 overlapping images is generated for each audio file, and this is provided as input to our first Learning Module (LM). As seen in figure [3] are four LM's in our model and each one consists of a time distributed convolutional layer, batch normalization layer, an activation function layer, dropout layer, and lastly the max pooling layer. The convolutional layers have a kernel size of 3 × 3.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…After applying a rolling window on the mel spectrogram, a sequence of 6 overlapping images is generated for each audio file, and this is provided as input to our first Learning Module (LM). As seen in figure [3] are four LM's in our model and each one consists of a time distributed convolutional layer, batch normalization layer, an activation function layer, dropout layer, and lastly the max pooling layer. The convolutional layers have a kernel size of 3 × 3.…”
Section: Methodsmentioning
confidence: 99%
“…The design is shown to perform effectively with Log-Mel Spectrograms when combined with CNN+LSTM architecture in this article. The aim of [3] is to find the relation between the duration of speech length and the recognition rate of emotions. In this paper, analysis is performed using a CNN model having two convolution layers where magnitude spectrograms are taken as features.…”
Section: Introductionmentioning
confidence: 99%
“…both cases we watch an improvement in the classification execution using the adversarial techniques. Boris Puterka [7] in 2018 author proposed a method for speech emotion recognition to analysis the time of the results. In the SER used the CNN and spectrograms used as feature extraction.…”
Section: Knowledge-based Techniques:-mentioning
confidence: 99%
“…The recurrent neural network used to predicts the statement level and WRN predict the segment level and learn a representation of spectrogram. RNN [7] In the speech emotion, recognition using the CNN and spectrograms used as feature extraction.…”
Section: Audio Analysis Andmentioning
confidence: 99%
“…There are many, more or less complex features that could be used in SER systems, e.g., in [ 19 , 20 ] a Teager energy operator (TEO) was used to dynamically adapt to various pitch and formants distributions to improve their sensitivity to emotional changes. The papers [ 21 , 22 ] were focused on time and speech segmentation aspects involved in SER.…”
Section: Introductionmentioning
confidence: 99%