2022
DOI: 10.1109/taffc.2019.2944380
|View full text |Cite
|
Sign up to set email alerts
|

Investigation of Speech Landmark Patterns for Depression Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 32 publications
(37 citation statements)
references
References 52 publications
0
26
0
1
Order By: Relevance
“…When Epps and his colleagues, including a researcher at Sonde Health, analysed voice samples recorded with high-quality microphones in a lab, they were able to detect depression with roughly 94% accuracy (see 'Depressed tones'). When using speech samples that people recorded in their own environments on their smartphones, the accuracy dropped to less than 75%, the researchers reported in a 2019 paper 6 .…”
Section: Tricky Translationmentioning
confidence: 99%
“…When Epps and his colleagues, including a researcher at Sonde Health, analysed voice samples recorded with high-quality microphones in a lab, they were able to detect depression with roughly 94% accuracy (see 'Depressed tones'). When using speech samples that people recorded in their own environments on their smartphones, the accuracy dropped to less than 75%, the researchers reported in a 2019 paper 6 .…”
Section: Tricky Translationmentioning
confidence: 99%
“…Second, the scarcity of the two speech landmarks, ± f and ± v , deserves some attention. According to Huang, Epps and Joachim [ 25 ] and Ishikawa et al [ 22 ], ± f is an indicator of the onset/offset of voiceless fricatives while ± v is an indicator of the onset/offset of voiced fricatives. As all the six fricatives in Taiwanese Mandarin (i.e., /x, ɕ, s, ʂ, ʐ, f/) are voiceless, the scarcity of the landmark feature ± v is understandable.…”
Section: Discussionmentioning
confidence: 99%
“…Additional features are added when researchers develop SpeechMark© based on their observations of speech recordings. The specifications and the articulatory interpretations of the six abrupt-consonantal landmarks based on DiCicco and Patel [ 23 ], MacAuslan [ 24 ], Ishikawa, MacAuslan and Boyce [ 19 ], Atkins, Boyce, MacAuslan and Silbert [ 21 ], Huang, Epps and Joachim [ 25 ] and Ishikawa, Rao, MacAuslan and Boyce [ 22 ] are summarized in Table 1 . Landmark-based acoustic analysis has been used to study the linguistic behaviors of several populations, including typically developing (TD) adults [ 18 , 19 ], individuals with dysarthria [ 23 ], children with cleft lip and palate [ 20 ], simultaneous bilingual children [ 26 ] and individuals with dysphonic speech [ 22 ].…”
Section: Introductionmentioning
confidence: 99%
“…For speech-based depression detection, many approaches have been proposed, and recent years have witnessed a recent shift from conventional acoustic features [2,[11][12][13] to deep learning [14,15]. Another category of effective features is based on speech articulation, such as vowel space area [16], vocal tract coordination features [17], and speech landmark-based features [18], because such features are less impacted by environmental noise and handset variability than typical prosodic features [18]. This may be because landmarks are detected based on multiple frequency bands (i.e., of which only some may be affected by noise) and differences in energy from one frame to the next (i.e., so absolute offset in energy across frames due to noise would have a small effect); however, further experimental work would be needed to confirm this.…”
Section: Introductionmentioning
confidence: 99%
“…In this study, we propose using speech landmark features as a new method to automatically estimate the number of 'pataka' utterances and rate per recording. Speech landmark features [18] provide information about abrupt articulatory consonant-vowel changes and therefore, seem a logical choice for estimating 'pataka' counts [19]. Furthermore, for automatic depression classification, a new method involving 'pataka' count and rate landmark feature normalization is also explored to help address within-corpus and cross-corpus speaker mismatch to improve system performance.…”
Section: Introductionmentioning
confidence: 99%