Depression Speech Recognition With a Three-Dimensional Convolutional Network

Wang, Hongbo; Liu, Yu; Zhen, Xiaoxiao; Tu, Xuyan

doi:10.3389/fnhum.2021.713823

Cited by 17 publications

(6 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A speech emotion recognition system is helpful in medical practice for detecting changes in mental state and emotions. For example, when a patient has mood swings, the system will react rapidly and examine their current psychological state [ 9 ]. As a result, the depression prediction methods might help design better mental health care software and technologies such as intelligent robots.…”

Section: Introductionmentioning

confidence: 99%

Arabic Speech Analysis for Classification and Prediction of Mental Illness due to Depression Using Deep Learning

Saba

Khan

Abunadi

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Depression is a global prevalent ailment for possible mental illness or mental disorder globally. Recognizing depressed early signs is critical for evaluating and preventing mental illness. With the progress of machine learning, it is possible to make intelligent systems capable of detecting depressive symptoms using speech analysis. This study presents a hybrid model to identify and predict mental illness from Arabic speech analysis due to depression. The proposed hybrid model comprises convolutional neural network (CNN) and a support vector machine (SVM) to identify and predict mental disorders. Experiments are performed on the Arabic speech benchmark data set of 200 speeches. A total of 70% of data were reserved for training, while 30% of data were to test the proposed model. The hybrid model (CNN + SVM) attained a 90.0% and 91.60% accuracy rate to predict the depression from Arabic speech analysis for training and testing stages. To authenticate the results of a proposed hybrid model, recurrent neural network (RNN) and CNN are also applied to the same data set individually, and the results are compared with each other. The RNN achieved an 80.70% and 81.60% accuracy rate to predict depression while speaking in the training and testing stages. The CNN predicted the depression in the training and testing stages with 88.50% and 86.60% accuracy rates. Based on the analysis, the proposed hybrid model secured better prediction results than individual RNN and CNN models on the same data set. Furthermore, the suggested model had a lower FPR, FNR, and higher accuracy, AUC, sensitivity, and specificity rate than individual RNN, CNN model performance in predicting depression. Finally, the achieved findings will be helpful to classify depression while speaking Arabic/speech and will be beneficial for physicians, psychiatrists, and psychologists in the detection of depression.

show abstract

Section: Introductionmentioning

confidence: 99%

Arabic Speech Analysis for Classification and Prediction of Mental Illness due to Depression Using Deep Learning

Saba

Khan

Abunadi

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

show abstract

“…PLP, and MFCC, called the low-level descriptors, are used to train the multiple classifier systems ( Long et al, 2017 ). The input of the network model is a 3D feature made up of FBANK, the first-order and second-order differences to use the information in speech signals entirely ( Wang et al, 2021 ). The findings of the aforementioned study illustrate that MFCC, PLP, and FBANK as front-end features can refine enough speech details.…”

Section: Related Workmentioning

confidence: 99%

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection

Liu

Li³

et al. 2023

Front. Neurosci.

View full text Add to dashboard Cite

IntroductionAs a biomarker of depression, speech signal has attracted the interest of many researchers due to its characteristics of easy collection and non-invasive. However, subjects’ speech variation under different scenes and emotional stimuli, the insufficient amount of depression speech data for deep learning, and the variable length of speech frame-level features have an impact on the recognition performance.MethodsThe above problems, this study proposes a multi-task ensemble learning method based on speaker embeddings for depression classification. First, we extract the Mel Frequency Cepstral Coefficients (MFCC), the Perceptual Linear Predictive Coefficients (PLP), and the Filter Bank (FBANK) from the out-domain dataset (CN-Celeb) and train the Resnet x-vector extractor, Time delay neural network (TDNN) x-vector extractor, and i-vector extractor. Then, we extract the corresponding speaker embeddings of fixed length from the depression speech database of the Gansu Provincial Key Laboratory of Wearable Computing. Support Vector Machine (SVM) and Random Forest (RF) are used to obtain the classification results of speaker embeddings in nine speech tasks. To make full use of the information of speech tasks with different scenes and emotions, we aggregate the classification results of nine tasks into new features and then obtain the final classification results by using Multilayer Perceptron (MLP). In order to take advantage of the complementary effects of different features, Resnet x-vectors based on different acoustic features are fused in the ensemble learning method.ResultsExperimental results demonstrate that (1) MFCC-based Resnet x-vectors perform best among the nine speaker embeddings for depression detection; (2) interview speech is better than picture descriptions speech, and neutral stimulus is the best among the three emotional valences in the depression recognition task; (3) our multi-task ensemble learning method with MFCC-based Resnet x-vectors can effectively identify depressed patients; (4) in all cases, the combination of MFCC-based Resnet x-vectors and PLP-based Resnet x-vectors in our ensemble learning method achieves the best results, outperforming other literature studies using the depression speech database.DiscussionOur multi-task ensemble learning method with MFCC-based Resnet x-vectors can fuse the depression related information of different stimuli effectively, which provides a new approach for depression detection. The limitation of this method is that speaker embeddings extractors were pre-trained on the out-domain dataset. We will consider using the augmented in-domain dataset for pre-training to improve the depression recognition performance further.

show abstract

“…There are numerous physiological sensors that have been investigated for the estimation of depression. Some of the common physiological measures employed for the recognition of depression include electroencephalography (EEG) [ 13 ], electrocardiography (ECG) [ 14 ], heart rate variability (HRV) [ 15 ], galvanic skin response (GSR) [ 16 ], actigraphy [ 17 ], and speech signals [ 18 ]. Physiological sensors used for analyzing depression offer several compensations over traditional questionnaires developed by psychologists.…”

Section: Introductionmentioning

confidence: 99%

A machine learning based depression screening framework using temporal domain features of the electroencephalography signals

Khan,

Umar Saeed,

Frnda

et al. 2024

PLoS ONE

View full text Add to dashboard Cite

Depression is a serious mental health disorder affecting millions of individuals worldwide. Timely and precise recognition of depression is vital for appropriate mediation and effective treatment. Electroencephalography (EEG) has surfaced as a promising tool for inspecting the neural correlates of depression and therefore, has the potential to contribute to the diagnosis of depression effectively. This study presents an EEG-based mental depressive disorder detection mechanism using a publicly available EEG dataset called Multi-modal Open Dataset for Mental-disorder Analysis (MODMA). This study uses EEG data acquired from 55 participants using 3 electrodes in the resting-state condition. Twelve temporal domain features are extracted from the EEG data by creating a non-overlapping window of 10 seconds, which is presented to a novel feature selection mechanism. The feature selection algorithm selects the optimum chunk of attributes with the highest discriminative power to classify the mental depressive disorders patients and healthy controls. The selected EEG attributes are classified using three different classification algorithms i.e., Best- First (BF) Tree, k-nearest neighbor (KNN), and AdaBoost. The highest classification accuracy of 96.36% is achieved using BF-Tree using a feature vector length of 12. The proposed mental depressive classification scheme outperforms the existing state-of-the-art depression classification schemes in terms of the number of electrodes used for EEG recording, feature vector length, and the achieved classification accuracy. The proposed framework could be used in psychiatric settings, providing valuable support to psychiatrists.

show abstract

Depression Speech Recognition With a Three-Dimensional Convolutional Network

Cited by 17 publications

References 39 publications

Arabic Speech Analysis for Classification and Prediction of Mental Illness due to Depression Using Deep Learning

Arabic Speech Analysis for Classification and Prediction of Mental Illness due to Depression Using Deep Learning

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection

A machine learning based depression screening framework using temporal domain features of the electroencephalography signals

Contact Info

Product

Resources

About