Time Window Analysis for Automatic Speech Emotion Recognition

Puterka, Boris; Kačur, Juraj

doi:10.23919/elmar.2018.8534630

Cited by 10 publications

(6 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After applying a rolling window on the mel spectrogram, a sequence of 6 overlapping images is generated for each audio file, and this is provided as input to our first Learning Module (LM). As seen in figure [3] are four LM's in our model and each one consists of a time distributed convolutional layer, batch normalization layer, an activation function layer, dropout layer, and lastly the max pooling layer. The convolutional layers have a kernel size of 3 × 3.…”

Section: Methodsmentioning

confidence: 99%

“…The design is shown to perform effectively with Log-Mel Spectrograms when combined with CNN+LSTM architecture in this article. The aim of [3] is to find the relation between the duration of speech length and the recognition rate of emotions. In this paper, analysis is performed using a CNN model having two convolution layers where magnitude spectrograms are taken as features.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Speech Emotion Recognition using Time Distributed CNN and LSTM

et al. 2021

View full text Add to dashboard Cite

Speech has several distinguishing characteristic features which has remained a state-of-the-art tool for extracting valuable information from audio samples. Our aim is to develop a emotion recognition system using these speech features, which would be able to accurately and efficiently recognize emotions through audio analysis. In this article, we have employed a hybrid neural network comprising four blocks of time distributed convolutional layers followed by a layer of Long Short Term Memory to achieve the same.The audio samples for the speech dataset are collectively assembled from RAVDESS, TESS and SAVEE audio datasets and are further augmented by injecting noise. Mel Spectrograms are computed from audio samples and are used to train the neural network. We have been able to achieve a testing accuracy of about 89.26%.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Speech Emotion Recognition using Time Distributed CNN and LSTM

et al. 2021

View full text Add to dashboard Cite

show abstract

“…both cases we watch an improvement in the classification execution using the adversarial techniques. Boris Puterka [7] in 2018 author proposed a method for speech emotion recognition to analysis the time of the results. In the SER used the CNN and spectrograms used as feature extraction.…”

Section: Knowledge-based Techniques:-mentioning

confidence: 99%

“…The recurrent neural network used to predicts the statement level and WRN predict the segment level and learn a representation of spectrogram. RNN [7] In the speech emotion, recognition using the CNN and spectrograms used as feature extraction.…”

Section: Audio Analysis Andmentioning

confidence: 99%

Design & Development of Network Geo-Fencing Model for User Monitoring and it’s Alertness in a Security Applications

Barapatre¹,

Deshmukh²

2019

IJRAT

View full text Add to dashboard Cite

Communication plays a vital role according to the people's emotion, as emotions and gesture play 80% role while communication. Nowadays emotion recognition and classification are used in different areas to understand the human feelings like in the robotics, Health care, Military, Home automation, Hands-free computing, Mobile Telephony, Video game,call-center system, Marketing, etc. SER can help better interaction between the machine and the human. There are various algorithms and combination of the algorithms are available to recognize and classify the audio according to their emotion. In this paper, we attempted to investigate the episodic significant works, their technique and the impact of the approaches and the scope of the correction of the results.

show abstract

“…There are many, more or less complex features that could be used in SER systems, e.g., in [ 19 , 20 ] a Teager energy operator (TEO) was used to dynamically adapt to various pitch and formants distributions to improve their sensitivity to emotional changes. The papers [ 21 , 22 ] were focused on time and speech segmentation aspects involved in SER.…”

Section: Introductionmentioning

confidence: 99%

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition

Kačur

Puterka

Pavlovičová

et al. 2021

Sensors

Self Cite

View full text Add to dashboard Cite

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.

show abstract

Time Window Analysis for Automatic Speech Emotion Recognition

Cited by 10 publications

References 8 publications

Speech Emotion Recognition using Time Distributed CNN and LSTM

Speech Emotion Recognition using Time Distributed CNN and LSTM

Design & Development of Network Geo-Fencing Model for User Monitoring and it’s Alertness in a Security Applications

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition

Contact Info

Product

Resources

About