Effectiveness of Voice Quality Features in Detecting Depression

Afshan, Amber; Guo, Jianmin; Park, Soo‐Jin; Ravi, Vijay; Flint, Jonathan; Alwan, Abeer

doi:10.21437/interspeech.2018-1399

Cited by 46 publications

(30 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This limited generalizability and overfitting are observed for instance in the drop in performance from development to test set in submissions to the AVEC challenges . For results that used held‐out test sets, which are more likely to generalize if they are representative, scores range from close to chance to higher scores including Afshan et al (F1‐score = 0.95) which most likely benefited from having a large sample size (N depressed = 735, N controls = 953) and all participants being the same sex (female). At the same time, Kächele et al obtained one of the highest performances in AVEC 2014 (ie, mean absolute error = 7.08), simply using provided audio baseline features and a random forest classifier (the highest performance combined audio and visual features) .…”

Section: Discussionmentioning

confidence: 99%

Automated assessment of psychiatric disorders using speech: A systematic review

Low

Bentley

Ghosh

2020

Laryngoscope Investig Oto

250

268

View full text Add to dashboard Cite

Objective There are many barriers to accessing mental health assessments including cost and stigma. Even when individuals receive professional care, assessments are intermittent and may be limited partly due to the episodic nature of psychiatric symptoms. Therefore, machine‐learning technology using speech samples obtained in the clinic or remotely could one day be a biomarker to improve diagnosis and treatment. To date, reviews have only focused on using acoustic features from speech to detect depression and schizophrenia. Here, we present the first systematic review of studies using speech for automated assessments across a broader range of psychiatric disorders. Methods We followed the Preferred Reporting Items for Systematic Reviews and Meta‐Analysis (PRISMA) guidelines. We included studies from the last 10 years using speech to identify the presence or severity of disorders within the Diagnostic and Statistical Manual of Mental Disorders (DSM‐5). For each study, we describe sample size, clinical evaluation method, speech‐eliciting tasks, machine learning methodology, performance, and other relevant findings. Results 1395 studies were screened of which 127 studies met the inclusion criteria. The majority of studies were on depression, schizophrenia, and bipolar disorder, and the remaining on post‐traumatic stress disorder, anxiety disorders, and eating disorders. 63% of studies built machine learning predictive models, and the remaining 37% performed null‐hypothesis testing only. We provide an online database with our search results and synthesize how acoustic features appear in each disorder. Conclusion Speech processing technology could aid mental health assessments, but there are many obstacles to overcome, especially the need for comprehensive transdiagnostic and longitudinal studies. Given the diverse types of data sets, feature extraction, computational methodologies, and evaluation criteria, we provide guidelines for both acquiring data and building machine learning models with a focus on testing hypotheses, open science, reproducibility, and generalizability. Level of Evidence 3a

show abstract

Section: Discussionmentioning

confidence: 99%

Automated assessment of psychiatric disorders using speech: A systematic review

Low

Bentley

Ghosh

2020

Laryngoscope Investig Oto

250

268

View full text Add to dashboard Cite

show abstract

“…The mean age of the control group was 30.1 years (± 12.6 years), whereas the mean age of the depression group was 42.9 years (± 13.0 years). There is no standardization on age controlling in studies: some selected age-matched controls to their samples (Alghowinem et al 2013b;Alghowinem et al 2012;Cummins et al 2015); and some did not (Afshan et al 2018;Cannizzaro et al 2004;Higuchi et al 2018;Jiang et al 2017;Joshi et al 2013;Liu et al 2015;Ozdas et al, 2004;Scherer et al 2013). Given this heterogeneity, in this work, we assume the perspective of the majority of revised studies in which age between groups was not controlled.…”

Section: Methodsmentioning

confidence: 99%

Detection of major depressive disorder using vocal acoustic analysis and machine learning—an exploratory study

et al. 2020

View full text Add to dashboard Cite

Purpose Diagnosis and treatment in psychiatry are still highly dependent on reports from patients and on clinician judgment. This fact makes them prone to memory and subjectivity biases. As for other medical fields, where objective biomarkers are available, there has been an increasing interest in the development of such tools in psychiatry. To this end, vocal acoustic parameters have been recently studied as possible objective biomarkers, instead of otherwise invasive and costly methods. Patients suffering from different mental disorders, such as major depressive disorder (MDD), may present with alterations of speech. These can be described as uninteresting, monotonous, and spiritless speech and low voice. Methods Thirty-three individuals (11 males) over 18 years old were selected, 22 of which being previously diagnosed with MDD and 11 healthy controls. Their speech was recorded in naturalistic settings, during a routine medical evaluation for psychiatric patients, and in different environments for healthy controls. Voices from third parties were removed. The recordings were submitted to a vocal feature extraction algorithm, and to different machine learning classification techniques. Results The results showed that random tree models with 100 trees provided the greatest classification performances. It achieved mean accuracy of 87.5575% ± 1.9490, mean kappa index, sensitivity, and specificity of 0.7508 ± 0.0319, 0.9149 ± 0.0204, and 0.8354 ± 0.0254, respectively, for the detection of MDD. Conclusion The use of machine learning classifiers with vocal acoustic features appears to be very promising for the detection of major depressive disorder in this exploratory study, but further experiments with a larger sample will be necessary to validate our findings.

show abstract

“…Taken together, our research has replicated previous results in which voice features were found to classify depression and has shown a stable generalizability when applied to new datasets, even under different emotion context. Though the length of voice recordings in our research are around10s, research on the same interview speech dataset has showed even 10 seconds length can reach ideal classification accuracy [60]. What’s more, short utterance has been proved to be effective in speaker identification [61–64].…”

Section: Discussionmentioning

confidence: 99%

Re-examining the robustness of voice features in predicting depression: Compared with baseline of confounders

et al. 2019

Self Cite

View full text Add to dashboard Cite

A large proportion of Depression Disorder patients do not receive an effective diagnosis, which makes it necessary to find a more objective assessment to facilitate a more rapid and accurate diagnosis of depression. Speech data is easy to acquire clinically, its association with depression has been studied, although the actual predictive effect of voice features has not been examined. Thus, we do not have a general understanding of the extent to which voice features contribute to the identification of depression. In this study, we investigated the significance of the association between voice features and depression using binary logistic regression, and the actual classification effect of voice features on depression was re-examined through classification modeling. Nearly 1000 Chinese females participated in this study. Several different datasets was included as test set. We found that 4 voice features (PC1, PC6, PC17, PC24, P <0.05, corrected) made significant contribution to depression, and that the contribution effect of the voice features alone reached 35.65% ( Nagelkerke's R 2 ). In classification modeling, voice data based model has consistently higher predicting accuracy(F-measure) than the baseline model of demographic data when tested on different datasets, even across different emotion context. F-measure of voice features alone reached 81%, consistent with existing data. These results demonstrate that voice features are effective in predicting depression and indicate that more sophisticated models based on voice features can be built to help in clinical diagnosis.

show abstract

Effectiveness of Voice Quality Features in Detecting Depression

Cited by 46 publications

References 38 publications

Automated assessment of psychiatric disorders using speech: A systematic review

Automated assessment of psychiatric disorders using speech: A systematic review

Detection of major depressive disorder using vocal acoustic analysis and machine learning—an exploratory study

Re-examining the robustness of voice features in predicting depression: Compared with baseline of confounders

Contact Info

Product

Resources

About