Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions

Huang, Zhaocheng; Epps, Julien; Joachim, Dale; Chen, Michael

doi:10.21437/interspeech.2018-1743

Cited by 39 publications

(23 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The other aspect impacting the quality of the recordings is the chosen recording device, which takes over the previous dichotomy between realism and control over the collected data. While an increasing number of studies are based on smartphones recordings [e.g., Huang et al ( 66 )], it could be feared that acoustic features extracted from these recordings may suffer from the low recording quality of these devices.…”

Section: Guidelinesmentioning

confidence: 99%

How to Design a Relevant Corpus for Sleepiness Detection Through Voice?

Martin

Rouas

Micoulaud‐Franchi

et al. 2021

Front. Digit. Health

View full text Add to dashboard Cite

This article presents research on the detection of pathologies affecting speech through automatic analysis. Voice processing has indeed been used for evaluating several diseases such as Parkinson, Alzheimer, or depression. If some studies present results that seem sufficient for clinical applications, this is not the case for the detection of sleepiness. Even two international challenges and the recent advent of deep learning techniques have still not managed to change this situation. This article explores the hypothesis that the observed average performances of automatic processing find their cause in the design of the corpora. To this aim, we first discuss and refine the concept of sleepiness related to the ground-truth labels. Second, we present an in-depth study of four corpora, bringing to light the methodological choices that have been made and the underlying biases they may have induced. Finally, in light of this information, we propose guidelines for the design of new corpora.

show abstract

Section: Guidelinesmentioning

confidence: 99%

How to Design a Relevant Corpus for Sleepiness Detection Through Voice?

Martin

Rouas

Micoulaud‐Franchi

et al. 2021

Front. Digit. Health

View full text Add to dashboard Cite

show abstract

“…This study adopted four datasets that included only 'pataka' task utterances. They were derived from subsets of the Sonde Health 1 (SH1) [21], Sonde Health 2 (SH2) [10,18], Sonde Health 3 (SH3), and Yale depression datasets. Similarly to the SH1 and SH2, the SH3 was privately collected via personal Android and iOS smart devices (e.g.…”

Section: Datasetsmentioning

confidence: 99%

“…Recently, automatic speech-based depression studies found among a review of dozens of studies [4] that the DDK 'pataka' task has been used due to its clinical history as an evaluative tool and restriction of speakers' phonetic variability unlike conversational speech activities. For example, [10] utilized acoustic speech features from 'pataka' recordings to automatically detect individuals with depression with nearly 70% accuracy. However, still little is known about what kind of influence the number of 'pataka' utterance or rate of speech have on acoustic-based features, and further, the effects these attributes have on automatic speech-based depression classification.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Elicitation Compliance for Short-Duration Speech Based Depression Detection

Stasak

Huang

Joachim³

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Detecting depression from the voice in naturalistic environments is challenging, particularly for short-duration audio recordings. This enhances the need to interpret and make optimal use of elicited speech. The rapid consonant-vowel syllable combination 'pataka' has frequently been selected as a clinical motor-speech task. However, there is significant variability in elicited recordings, which remains to be investigated. In this multi-corpus study of over 25,000 'pataka' utterances, it was discovered that speech landmarkbased features were sensitive to the number of 'pataka' utterances per recording. This landmark feature sensitivity was newly exploited to automatically estimate 'pataka' count and rate, achieving root mean square errors nearly three times lower than chance-level. Leveraging count-rate knowledge of the elicited speech for depression detection, results show that the estimated 'pataka' number and rate are important for normalizing evaluative 'pataka' speech data. Count and/or rate normalized 'pataka' models produced relative reductions in depression classification error of up to 26% compared with non-normalized models.

show abstract

“…Depressive speech can be detected automatically with high accuracy based on voice cues, even under adverse recording conditions, such as low microphone quality, short utterances, and background environmental noise [19,41]. Not only the detection, but also a severity assessment of depression is possible using a speech sample: In men and women, certain voice features were found to be highly predictive of their HAMD (Hamilton Depression Rating Scale) score, which is the most widely used diagnostic tool to measure a patient's degree of depression and suicide risk [36].…”

Section: Mental Health Assessmentmentioning

confidence: 99%

Privacy Implications of Voice and Speech Analysis – Information Disclosure by Inference

Kröger

Lutz

Raschke

2020

Privacy and Identity Management. Data for Better Living: AI and Privacy

View full text Add to dashboard Cite

Internet-connected devices, such as smartphones, smartwatches, and laptops, have become ubiquitous in modern life, reaching ever deeper into our private spheres. Among the sensors most commonly found in such devices are microphones. While various privacy concerns related to microphone-equipped devices have been raised and thoroughly discussed, the threat of unexpected inferences from audio data remains largely overlooked. Drawing from literature of diverse disciplines, this paper presents an overview of sensitive pieces of information that can, with the help of advanced data analysis methods, be derived from human speech and other acoustic elements in recorded audio. In addition to the linguistic content of speech, a speaker's voice characteristics and manner of expression may implicitly contain a rich array of personal information, including cues to a speaker's biometric identity, personality, physical traits, geographical origin, emotions, level of intoxication and sleepiness, age, gender, and health condition. Even a person's socioeconomic status can be reflected in certain speech patterns. The findings compiled in this paper demonstrate that recent advances in voice and speech processing induce a new generation of privacy threats.

show abstract

Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions

Cited by 39 publications

References 25 publications

How to Design a Relevant Corpus for Sleepiness Detection Through Voice?

How to Design a Relevant Corpus for Sleepiness Detection Through Voice?

Automatic Elicitation Compliance for Short-Duration Speech Based Depression Detection

Privacy Implications of Voice and Speech Analysis – Information Disclosure by Inference

Contact Info

Product

Resources

About