Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Harati, Amir; Shriberg, Elizabeth; Rutowski, Tomasz; Chlebek, Piotr; Lü, Yang; Oliveira, Ricardo Augusto Rabelo

doi:10.1109/icassp39728.2021.9414208

Cited by 18 publications

(15 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Usage statistics and survey results from this study taken together indicate that utilizing regular voice recordings of users answering questions from a smartphone app to analyze their levels of anxiety and depression is feasible. Ellipsis Health has previously published results of semantic ( Rutowski et al, 2019 , 2020 ) and acoustic ( Harati et al, 2021 ) analysis of speech to detect depression and anxiety using models trained, to the best of our knowledge, with the largest database reported in the literature ( Rutowski et al, 2019 ). We have also previously reported this algorithm performance is maintained (i.e., is portable) when applied to the current study population using long short-term memory (LSTM) models ( Rutowski et al, 2020 ).…”

Section: Discussionmentioning

confidence: 99%

“…In ongoing work, large corpora of transcribed speech are used for natural language processing (NLP) training to further develop semantic speech analysis ( Lan et al, 2019 ; Yang et al, 2019 ). The most relevant NLP advancements have been used for transfer learning and improvements in deep learning architecture like transformers ( Bengio, 2012 ; Vaswani et al, 2017 ), which maintain performance without using prohibitive amounts of labeled data ( Rutowski et al, 2020 ; Harati et al, 2021 ).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Feasibility of a Machine Learning-Based Smartphone Application in Detecting Depression and Anxiety in a Generally Senior Population

Lin

Nazreen²,

Rutowski³

et al. 2022

Front. Psychol.

Self Cite

View full text Add to dashboard Cite

BackgroundDepression and anxiety create a large health burden and increase the risk of premature mortality. Mental health screening is vital, but more sophisticated screening and monitoring methods are needed. The Ellipsis Health App addresses this need by using semantic information from recorded speech to screen for depression and anxiety.ObjectivesThe primary aim of this study is to determine the feasibility of collecting weekly voice samples for mental health screening. Additionally, we aim to demonstrate portability and improved performance of Ellipsis’ machine learning models for patients of various ages.MethodsStudy participants were current patients at Desert Oasis Healthcare, mean age 63 years (SD = 10.3). Two non-randomized cohorts participated: one with a documented history of depression within 24 months prior to the study (Group Positive), and the other without depression (Group Negative). Participants recorded 5-min voice samples weekly for 6 weeks via the Ellipsis Health App. They also completed PHQ-8 and GAD-7 questionnaires to assess for depression and anxiety, respectively.ResultsProtocol completion rate was 61% for both groups. Use beyond protocol was 27% for Group Positive and 9% for Group Negative. The Ellipsis Health App showed an AUC of 0.82 for the combined groups when compared to the PHQ-8 and GAD-7 with a threshold score of 10. Performance was high for senior participants as well as younger age ranges. Additionally, many participants spoke longer than the required 5 min.ConclusionThe Ellipsis Health App demonstrated feasibility in using voice recordings to screen for depression and anxiety among various age groups and the machine learning models using Transformer methodology maintain performance and improve over LSTM methodology when applied to the study population.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Feasibility of a Machine Learning-Based Smartphone Application in Detecting Depression and Anxiety in a Generally Senior Population

Lin

Nazreen²,

Rutowski³

et al. 2022

Front. Psychol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition, the platform offers an oncologist-facing dashboard, which facilitates patient referral for psycho-oncology services and allows timely coordination of care and patient-/person-centered approaches. Finally, Ellipsis Health has published a series of peer-reviewed technical papers validating the machine learning algorithms as well as the speech recognition performance that power the approach [ 39 , 40 , 41 , 42 , 43 ].…”

Section: Introductionmentioning

confidence: 99%

Evaluating the Feasibility and Acceptability of an Artificial-Intelligence-Enabled and Speech-Based Distress Screening Mobile App for Adolescents and Young Adults Diagnosed with Cancer: A Study Protocol

Zhang

Acquati

Aratow

et al. 2022

Cancers

View full text Add to dashboard Cite

Adolescents and young adults (AYAs) diagnosed with cancer are an age-defined population, with studies reporting up to 45% of the population experiencing psychological distress. Although it is essential to screen and monitor for psychological distress throughout AYAs’ cancer journeys, many cancer centers fail to effectively implement distress screening protocols largely due to busy clinical workflow and survey fatigue. Recent advances in mobile technology and speech science have enabled flexible and engaging methods to monitor psychological distress. However, patient-centered research focusing on these methods’ feasibility and acceptability remains lacking. Therefore, in this project, we aim to evaluate the feasibility and acceptability of an artificial intelligence (AI)-enabled and speech-based mobile application to monitor psychological distress among AYAs diagnosed with cancer. We use a single-arm prospective cohort design with a stratified sampling strategy. We aim to recruit 60 AYAs diagnosed with cancer and to monitor their psychological distress using an AI-enabled speech-based distress monitoring tool over a 6 month period. The primary feasibility endpoint of this study is defined by the number of participants completing four out of six monthly distress assessments, and the acceptability endpoint is defined both quantitatively using the acceptability of intervention measure and qualitatively using semi-structured interviews.

show abstract

“…Recently, speech-based automatic diagnosis of depression has gained significant momentum [6,7,8] and advancements in deep learning have pushed their performance to newer heights [9,10,11,12,13,14]. However, data scarcity still remains one of the major challenges in building reliable systems for MDD modeling purposes.…”

Section: Introductionmentioning

confidence: 99%

FrAUG: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals

Ravi¹,

Wang²,

Flint³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, a data augmentation method is proposed for depression detection from speech signals. Samples for data augmentation were created by changing the frame-width and the frame-shift parameters during the feature extraction process. Unlike other data augmentation methods (such as VTLP, pitch perturbation, or speed perturbation), the proposed method does not explicitly change acoustic parameters but rather the time-frequency resolution of frame-level features. The proposed method was evaluated using two different datasets, models, and input acoustic features. For the DAIC-WOZ (English) dataset when using the DepAudioNet model and mel-Spectrograms as input, the proposed method resulted in an improvement of 5.97% (validation) and 25.13% (test) when compared to the baseline. The improvements for the CON-VERGE (Mandarin) dataset when using the x-vector embeddings with CNN as the backend and MFCCs as input features were 9.32% (validation) and 12.99% (test). Baseline systems do not incorporate any data augmentation. Further, the proposed method outperformed commonly used data-augmentation methods such as noise augmentation, VTLP, Speed, and Pitch Perturbation. All improvements were statistically significant.

show abstract

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Cited by 18 publications

References 29 publications

Feasibility of a Machine Learning-Based Smartphone Application in Detecting Depression and Anxiety in a Generally Senior Population

Feasibility of a Machine Learning-Based Smartphone Application in Detecting Depression and Anxiety in a Generally Senior Population

Evaluating the Feasibility and Acceptability of an Artificial-Intelligence-Enabled and Speech-Based Distress Screening Mobile App for Adolescents and Young Adults Diagnosed with Cancer: A Study Protocol

FrAUG: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals

Contact Info

Product

Resources

About