Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network

Zisad, Sharif Noor; Hossain, Mohammad Shahadat; Andersson, Karl

doi:10.1007/978-3-030-59277-6_26

Cited by 46 publications

(21 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reason behind conducting the cross-validation on the whole dataset instead of just the training dataset is to avoid over-fitting or selection bias. And number of folds was selected as five based on better results obtained in previous works of speech data classification for neurological disorder [30]. In Tables 1, scores values that were within top five and which did not get repeated for more than ten times have been highlighted.…”

Section: Resultsmentioning

confidence: 99%

A Belief Rule Base Approach to Support Comparison of Digital Speech Signal Features for Parkinson’s Disease Diagnosis

et al. 2021

Self Cite

View full text Add to dashboard Cite

Parkinson's disease is a neurological disorder. It affects the structures of the central and peripheral nervous system that control movement. One of the symptoms of Parkinson's disease is difficulty in speaking. Hence, analysis of speech signal of patients may provide valuable features for diagnosing. Previous works on diagnosis based on speech data have employed machine learning and deep learning techniques. However, these approaches do not address the various uncertainties in data. Belief rule based expert system (BRBES) is an approach that can reason under various forms of data uncertainty. Thus, the main objective of this research is to compare the potential of BRBES on various speech signal features of patients of parkinson's disease. The research took into account various types of standard speech signal features such MFCCs, TQWTs etc. A BRBES was trained on a dataset of 188 patients of parkinson's disease and 64 healthy candidates with 5fold cross validation. It was optimized using an exploitive version of the nature inspired optimization algorithm called BRB-based adaptive differential evolution (BRBaDE). The optimized model performed better than explorative BRBaDE, genetic algorithm and MATLAB's FMIN-CON optimization on most of these features. It was also found that for speech based diagnosis of Parkinson's disease under uncertainty, the features such as Glottis Quotient, Jitter variants, MFCCs, RPDE, DFA and PPE are relatively more suitable.

show abstract

Section: Resultsmentioning

confidence: 99%

A Belief Rule Base Approach to Support Comparison of Digital Speech Signal Features for Parkinson’s Disease Diagnosis

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Issa et al used CNN to classify RAVDESS audio files based on a combination of spectral parameters and reported a recognition rate of 71.61% [55]. For those files, Zisad et al obtained an average accuracy of 82.5% employing data augmentation, and it used a CNN classifier to distinguish emotion from the dataset [56]. For the same subset of the RAVDESS dataset, a real-time speech recognition system using transfer learning techniques for the VGG16 pre-trained model showed an emotion perception rate of 62.51% [57].…”

Section: E Analysis Of Models Using Multilingual Datasets (Setup 7)mentioning

confidence: 99%

Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks

et al. 2022

View full text Add to dashboard Cite

In this study, we have presented a deep learning-based implementation for speech emotion recognition (SER). The system combines a deep convolutional neural network (DCNN) and a bidirectional long-short term memory (BLSTM) network with a time-distributed flatten (TDF) layer. The proposed model has been applied for the recently built audio-only Bangla emotional speech corpus SUBESCO. A series of experiments were carried out to analyze all the models discussed in this paper for baseline, cross-lingual, and multilingual training-testing setups. The experimental results reveal that the model with a TDF layer achieves better performance compared with other state-of-the-art CNN-based SER models which can work on both temporal and sequential representation of emotions. For the cross-lingual experiments, cross-corpus training, multi-corpus training, and transfer learning were employed for the Bangla and English languages using the SUBESCO and RAVDESS datasets. The proposed model has attained a state-of-the-art perceptual efficiency achieving weighted accuracies (WAs) of 86.9%, and 82.7% for the SUBESCO and RAVDESS datasets, respectively.

show abstract

“…This nlpaug [23] method uses word-embedding techniques and various augmenter strategies such as insertion and substitutions to augment the data on a character level, word level and sentence level. To augment (i.e., increase the sample size of) tweets, we perform a character level augmentation (using KeyboardAug [24], OcrAug, and RandomAug [25] methods), word level augmentation (AntonymAug [25], Contextu-alWordEmbsAug, SpellingAug SplitAug, SynonymAug, TfIdfAug, WordEmbsAug and BackTranslationAug and ReservedAug), sentence level augmentation (using Contextual-WordEmbsForSentenceAug, AbstSummAug, and LambadaAug [26]). Figure 1 shows the steps in our framework; it includes 5 major steps [27][28][29][30][31][32][33][34].…”

Section: Dataset Preparation and Preprocessingmentioning

confidence: 99%

Novel Hate Speech Detection Using Word Cloud Visualization and Ensemble Learning Coupled with Count Vectorizer

Turki

Roy

2022

Applied Sciences

View full text Add to dashboard Cite

A plethora of negative behavioural activities have recently been found in social media. Incidents such as trolling and hate speech on social media, especially on Twitter, have grown considerably. Therefore, detection of hate speech on Twitter has become an area of interest among many researchers. In this paper, we present a computational framework to (1) examine out the computational challenges behind hate speech detection and (2) generate high performance results. First, we extract features from Twitter data by utilizing a count vectorizer technique. Then, we provide the labeled dataset of constructed features to adopted ensemble methods, including Bagging, AdaBoost, and Random Forest. After training, we classify new tweet examples into one of the two categories, hate speech or non-hate speech. Experimental results show (1) that Random Forest has surpassed other methods by generating 95% using accuracy performance results and (2) word cloud displays the most prominent tweets that are responsible for hateful sentiments.

show abstract

Speech Emotion Recognition in Neurological Disorders Using Convolutional Neural Network

Cited by 46 publications

References 23 publications

A Belief Rule Base Approach to Support Comparison of Digital Speech Signal Features for Parkinson’s Disease Diagnosis

A Belief Rule Base Approach to Support Comparison of Digital Speech Signal Features for Parkinson’s Disease Diagnosis

Bangla Speech Emotion Recognition and Cross-Lingual Study Using Deep CNN and BLSTM Networks

Novel Hate Speech Detection Using Word Cloud Visualization and Ensemble Learning Coupled with Count Vectorizer

Contact Info

Product

Resources

About