2020
DOI: 10.3390/e22060688
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks

Abstract: This paper proposes a speech-based method for automatic depression classification. The system is based on ensemble learning for Convolutional Neural Networks (CNNs) and is evaluated using the data and the experimental protocol provided in the Depression Classification Sub-Challenge (DCC) at the 2016 Audio–Visual Emotion Challenge (AVEC-2016). In the pre-processing phase, speech files are represented as a sequence of log-spectrograms and randomly sampled to balance positive and negative samples. For the classif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 57 publications
(21 citation statements)
references
References 49 publications
(65 reference statements)
0
21
0
Order By: Relevance
“…Therefore, diverse acoustic feature–type groups were explored (e.g., glottal, prosodic, spectral) using the COVAREP speech toolkit [ 32 ]. Previously, COVAREP features have been utilized to investigate and automatically recognize voice quality [ 33 , 34 ], respiratory [ 35 ], voice [ 36 , 37 ], and psychogenic disorders [ 38 , 39 ]. The COVAREP feature set includes 73 individual glottal, prosodic, and spectral features.…”
Section: Methodsmentioning
confidence: 99%
“…Therefore, diverse acoustic feature–type groups were explored (e.g., glottal, prosodic, spectral) using the COVAREP speech toolkit [ 32 ]. Previously, COVAREP features have been utilized to investigate and automatically recognize voice quality [ 33 , 34 ], respiratory [ 35 ], voice [ 36 , 37 ], and psychogenic disorders [ 38 , 39 ]. The COVAREP feature set includes 73 individual glottal, prosodic, and spectral features.…”
Section: Methodsmentioning
confidence: 99%
“…We extracted a total of 508 acoustic features from each recording using two audio signal analysis python libraries: pyAudioAnalysis [42]; and DisVoice [43]. Feature sets from both libraries have been previously used to classify psychiatric disorders and pathological speech [15,[44][45][46].…”
Section: Acoustic Featuresmentioning
confidence: 99%
“…Indeed, answering this question allows at the same time to design the vocal task (i.e., length of the text or minimum length of the answer of spontaneous speech), and to choose the length of the chunks when slicing samples to augment data. In the SLEEP corpus, all the samples have a length under 5 s, with a mean of 3.87 s. Indeed, these recordings come from longer audio samples that were sliced into chunks of approximately 4 s. The same practice is usually employed in systems designed for depression ( 9 , 73 ), in which this sample length has been demonstrated to maximize the accuracy of the employed dataset. Another study on the same task uses chunks of 10 s ( 74 ), but the goal behind this choice is not clearly expressed.…”
Section: Guidelinesmentioning
confidence: 99%
“…On the other side, the estimation of psychiatric pathologies that impact both motorical actions and cognitive planning, with complex phenomenology. They include bipolar disorders [57.4% of accuracy in Ringeval et al ( 7 )], autism spectrum [69.4% on four categories in Asgari et al ( 8 )] or, the most studied so far, depression [88% of accuracy in in Vázquez-Romero and Gallardo-Antolín ( 9 )], a complete review is proposed in Cummins et al ( 10 ).…”
Section: Introductionmentioning
confidence: 99%