2020
DOI: 10.1007/978-3-030-59277-6_26
|View full text |Cite
|
Sign up to set email alerts
|

Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network

Abstract: Detecting emotions from the speech is one of the emergent research fields in the area of human information processing. Expressing emotion is a very di cult task for a person with neurological disorder. Hence, a Speech Emotion Recognition (SER) system may solve this by ensuring a barrier-less communication. Various research has been carried out in the area of SER. Therefore, the main objective of this research is to develop a system that can recognize emotion from the speech of a neurologically disordered perso… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
3

Relationship

6
3

Authors

Journals

citations
Cited by 46 publications
(21 citation statements)
references
References 23 publications
0
20
1
Order By: Relevance
“…The reason behind conducting the cross-validation on the whole dataset instead of just the training dataset is to avoid over-fitting or selection bias. And number of folds was selected as five based on better results obtained in previous works of speech data classification for neurological disorder [30]. In Tables 1, scores values that were within top five and which did not get repeated for more than ten times have been highlighted.…”
Section: Resultsmentioning
confidence: 99%
“…The reason behind conducting the cross-validation on the whole dataset instead of just the training dataset is to avoid over-fitting or selection bias. And number of folds was selected as five based on better results obtained in previous works of speech data classification for neurological disorder [30]. In Tables 1, scores values that were within top five and which did not get repeated for more than ten times have been highlighted.…”
Section: Resultsmentioning
confidence: 99%
“…Issa et al used CNN to classify RAVDESS audio files based on a combination of spectral parameters and reported a recognition rate of 71.61% [55]. For those files, Zisad et al obtained an average accuracy of 82.5% employing data augmentation, and it used a CNN classifier to distinguish emotion from the dataset [56]. For the same subset of the RAVDESS dataset, a real-time speech recognition system using transfer learning techniques for the VGG16 pre-trained model showed an emotion perception rate of 62.51% [57].…”
Section: E Analysis Of Models Using Multilingual Datasets (Setup 7)mentioning
confidence: 99%
“…This nlpaug [23] method uses word-embedding techniques and various augmenter strategies such as insertion and substitutions to augment the data on a character level, word level and sentence level. To augment (i.e., increase the sample size of) tweets, we perform a character level augmentation (using KeyboardAug [24], OcrAug, and RandomAug [25] methods), word level augmentation (AntonymAug [25], Contextu-alWordEmbsAug, SpellingAug SplitAug, SynonymAug, TfIdfAug, WordEmbsAug and BackTranslationAug and ReservedAug), sentence level augmentation (using Contextual-WordEmbsForSentenceAug, AbstSummAug, and LambadaAug [26]). Figure 1 shows the steps in our framework; it includes 5 major steps [27][28][29][30][31][32][33][34].…”
Section: Dataset Preparation and Preprocessingmentioning
confidence: 99%