Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments

Huang, Zhaocheng; Epps, Julien; Joachim, Dale

doi:10.1109/icassp40776.2020.9054323

Cited by 31 publications

(27 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous studies in depression prediction using speech [15,16] have shown the superiority of MFCCs over other audio based features like extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) [7] and DEEP SPECTRUM features [1]. Huang et al [9] showed with their depression classification study that coordination features computed from MFCCs perform better with respect to formants and eGeMAPS features. So to compare how robust and effective the TVs are for detecting schizophrenia, we chose MFCCs as the baseline audio features for our study.…”

Section: Mel-frequency Cepstral Coefficients (Mfccs)mentioning

confidence: 99%

“…Huang et al [9] in a recent study with MDD introduces a new channel delay correlation method inspired by TDEC, which uses a different correlation structure with correlations starting from 0 to a delay of 'D' frames (a design choice). The delayed autocorrelations and cross-correlations across channels are stacked to form the FVTC correlation structure.…”

Section: Full Vocal Tract Coordination (Fvtc)mentioning

confidence: 99%

“…We designed a CNN model inspired by the one in [9] which takes the FVTC correlation matrix computed in section 3.2 as the input.…”

Section: Fvtc Cnn Model (Fvtc-cnn) : Modelmentioning

confidence: 99%

“…Time-delay embedded correlation (TDEC) analysis has shown promising results in assessing neuromotor coordination in Major Depressive Disorder (MDD), and the eigenspectra derived from the correlation matrices have been used effectively for classification of MDD subjects from healthy [17,22,24]. Recently, new multi-scale full vocal tract coordination (FVTC) features generated with a dilated CNN have shown further improvement in classification for selected datasets of MDD subjects [9]. The FVTC method addresses repetitive sampling and matrix discontinuity issues of TDEC analysis by introducing a new channel-delay correlation matrix.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks

Siriwardena

Kitchen

Kelly

et al. 2021

Proceedings of the 2021 International Conference on Multimodal Interaction

View full text Add to dashboard Cite

This study investigates the speech articulatory coordination in schizophrenia subjects exhibiting strong positive symptoms (e.g. hallucinations and delusions), using two distinct channel-delay correlation methods. We show that the schizophrenic subjects with strong positive symptoms and who are markedly ill pose complex articulatory coordination pattern in facial and speech gestures than what is observed in healthy subjects. This distinction in speech coordination pattern is used to train a multimodal convolutional neural network (CNN) which uses video and audio data during speech to distinguish schizophrenic patients with strong positive symptoms from healthy subjects. We also show that the vocal tract variables (TVs) which correspond to place of articulation and glottal source outperform the Mel-frequency Cepstral Coefficients (MFCCs) when fused with Facial Action Units (FAUs) in the proposed multimodal network. For the clinical dataset we collected, our best performing multimodal network improves the mean F1 score for detecting schizophrenia by around 18% with respect to the full vocal tract coordination (FVTC) baseline method implemented with fusing FAUs and MFCCs. CCS CONCEPTS• Computing methodologies → Neural networks; • Social and professional topics → People with disabilities.

show abstract

Section: Mel-frequency Cepstral Coefficients (Mfccs)mentioning

confidence: 99%

Section: Full Vocal Tract Coordination (Fvtc)mentioning

confidence: 99%

“…We designed a CNN model inspired by the one in [9] which takes the FVTC correlation matrix computed in section 3.2 as the input.…”

Section: Fvtc Cnn Model (Fvtc-cnn) : Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks

Siriwardena

Kitchen

Kelly

et al. 2021

Proceedings of the 2021 International Conference on Multimodal Interaction

View full text Add to dashboard Cite

show abstract

“…Sample features include voice quality [17] [16], articulation [18] [19] [20], speech rate [19], and spectral [9] features. Advances in deep learning [21] have led to improved results in a range of affective and behavioral health tasks [22][23] [24][25] [26][27] [28]. In deep learning the focus is to learn feature representation from data.…”

Section: Introductionmentioning

confidence: 99%

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Harati

Shriberg

Rutowski

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Speech-based algorithms have gained interest for the management of behavioral health conditions such as depression. We explore a speech-based transfer learning approach that uses a lightweight encoder and that transfers only the encoder weights, enabling a simplified run-time model. Our study uses a large data set containing roughly two orders of magnitude more speakers and sessions than used in prior work. The large data set enables reliable estimation of improvement from transfer learning. Results for the prediction of PHQ-8 labels show up to 27% relative performance gains for binary classification; these gains are statistically significant with a p-value close to zero. Improvements were also found for regression. Additionally, the gain from transfer learning does not appear to require strong source task performance. Results suggest that this approach is flexible and offers promise for efficient implementation.

show abstract

Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening

Harati

Rutowski

Lü

et al. 2022

Biomedical Sensing and Analysis

View full text Add to dashboard Cite

Exploiting Vocal Tract Coordination Using Dilated CNNS For Depression Detection In Naturalistic Environments

Cited by 31 publications

References 27 publications

Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks

Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks

Speech-Based Depression Prediction Using Encoder-Weight-Only Transfer Learning and a Large Corpus

Generalization of Deep Acoustic and NLP Models for Large-Scale Depression Screening

Contact Info

Product

Resources

About