Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition

Bhanusree, Yalamanchili; Kumar, Samayamantula Srinivas; Rao, Anne Koteswara

doi:10.32890/jict2023.22.1.3

Cited by 4 publications

(3 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zheng et al (2018a) combined CNN with random forest for recognising emotion in speech, CNN was employed to extract the representative feature of emotion from speech data, while Random Forest was used to classify the extracted feature into basic emotion. A CNN and Random Forest hybrid was also used for speech emotion classification in (Yalamanchili et al, 2023). It was reported that CNN-RF performed better than the CNN model.…”

Section: Hybrid Models and Their Potential In Advancing Deep Learningmentioning

confidence: 99%

Deep Learning: Historical Overview from Inception to Actualization, Models, Applications and Future Trends

Ekundayo,

Ezugwu

2024

Preprint

View full text Add to dashboard Cite

Deep learning stands at the forefront of contemporary machine learning techniques and is well-known for its outstanding predictive accuracy, adaptability to data variability, and remarkable ability to generalize across diverse domains. These attributes have spurred rapid progress and the emergence of novel iterations within the discipline. Yet, this swift evolution often obscures the foundational breakthroughs, with even trailblazing researchers at risk of fading into obscurity despite their seminal contributions. This study aims to provide a historical narrative of deep learning, tracing its origins from the cybernetic era to its current state-of-the-art status. We critically examine the contributions of individual pioneer scholars who have profoundly influenced the development of deep neural networks under the taxonomy of supervised, unsupervised, and reinforcement learning. Furthermore, the study also discusses the trending deep neural network architectures, explaining their operational principles, confronting associated challenges, exploring real-world applications, and outlining potential future trajectories that could offer a starting point for aspiring researchers in the field.

show abstract

Section: Hybrid Models and Their Potential In Advancing Deep Learningmentioning

confidence: 99%

Deep Learning: Historical Overview from Inception to Actualization, Models, Applications and Future Trends

Ekundayo,

Ezugwu

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Therefore, a specific type of neural network, the recurrent neural network (RNN), uses previous outputs as inputs and is able to retain the previous time stamp information (Choi et al, 2017). Bhanusree et al (2023) An alternate gating mechanism with a mechanised GRU to resolve this issue was proposed (Chung et al, 2014), incorporating two gate operating mechanisms, the Update and Reset gates. The update gate eliminates the risk of vanishing gradient problems, whereas the reset gate allows for the continuous discarding of stored redundant information.…”

Section: Related Workmentioning

confidence: 99%

A Modified Gated Recurrent Unit Approach for Epileptic Electroencephalography Classification

Vinod Prakash,

Dharmender Kumar

2023

JICT

View full text Add to dashboard Cite

Epilepsy is one of the most severe non-communicable brain disorders associated with sudden attacks. Electroencephalography (EEG), a non-invasive technique, records brain activities, and these recordings are routinely used for the clinical evaluation of epilepsy. EEG signal analysis for seizure identification relies on expert manual examination, which is labour-intensive, time-consuming, and prone to human error. To overcome these limitations, researchers have proposed machine learning and deep learning approaches. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) have shown significant results in automating seizure prediction, but due to complex gated mechanisms and the storage of excessive redundant information, these approaches face slow convergence and a low learning rate. The proposed modified GRU approach includes an improved update gate unit that adjusts the update gate based on the output of the reset gate. By decreasing the amount of superfluous data in the reset gate, convergence is speeded, which improves both learning efficiency and the accuracy of epilepsy seizure prediction. The performance of the proposed approach is verified on a publicly available epileptic EEG dataset collected from the University of California, Irvine machine learning repository (UCI) in terms of performance metrics such as accuracy, precision, recall, and F1 score when it comes to diagnosing epileptic seizures. The proposed modified GRU has obtained 98.84% accuracy, 96.9% precision, 97.1 recall, and 97% F1 score. The performance results are significant because they could enhance the diagnosis and treatment of neurological disorders, leading to better patient outcomes.

show abstract

“…The propose ensemble model performed better as compared to Random Forest and CNN-LSTM. Bhanusree et al [23] proposed a model that used a time-distributed attention-layered CNN for feature extraction and a Random Forest for classification. The proposed model achieved classification accuracies of 92.2% and 90.3% on the RAVDESS and IEMOCAP datasets, respectively.…”

Section: Introductionmentioning

confidence: 99%

Audio Based Emotion Classification Using Classifier Ensemble

Mudassar,

Ul Haq,

Majid

et al. 2023

PJETS

View full text Add to dashboard Cite

This paper presents a novel approach of combining classifiers outputs for audio emotion recognition. The proposed classifiers ensemble technique combines the confusion matrices of base classifiers. It is because some classifiers with overall lower performance have better accuracy for a specific class as compared to others with overall higher accuracy. In this approach, the best results obtained for different emotion classes from various classifiers are combined to create a combined confusion matrix. The performance of this approach was analyzed using three emotional speech databases in different languages, i.e., Berlin emotional speech database (EMO-DB), Italian emotional speech database (EMOVO-DB), and Surrey audio-visual expressed emotion database (SAVEE-DB). The openSMILE toolkit was used to extract a total of 8543 audio features. These features include pitch, energy, intensity, jitter, shimmer, formants, MFCC, MFB, LSP and spectral features. These features were normalized using min-max normalization technique, while correlation-based feature selection (CFS) with best-first search approach was used for feature reduction. The classification was performed using five different base classifiers, i.e., SVM, MLP, IBK, AdaBoost, and Random Forest. The experimental results showed better performance for the proposed technique as compared to other state-of-the-art methods. The classification accuracies obtained for seven emotion classes were 91.8%, 83.7%, and 80.5% for the EMO-DB, EMOVO-DB, and SAVEE-DB, respectively.

show abstract

Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition

Cited by 4 publications

References 40 publications

Deep Learning: Historical Overview from Inception to Actualization, Models, Applications and Future Trends

Deep Learning: Historical Overview from Inception to Actualization, Models, Applications and Future Trends

A Modified Gated Recurrent Unit Approach for Epileptic Electroencephalography Classification

Audio Based Emotion Classification Using Classifier Ensemble

Contact Info

Product

Resources

About