Purpose In this study, the authors examined whether rhythm metrics capable of distinguishing languages with high and low temporal stress contrast also can distinguish among control and dysarthric speakers of American English with perceptually distinct rhythm patterns. Methods Acoustic measures of vocalic and consonantal segment durations were obtained for speech samples from 55 speakers across 5 groups (hypokinetic, hyperkinetic, flaccid-spastic, ataxic dysarthrias, and controls). Segment durations were used to calculate standard and new rhythm metrics. Discriminant function analyses (DFAs) were used to determine which sets of predictor variables (rhythm metrics) best discriminated between groups (control vs. dysarthrias; and among the 4 dysarthrias). A cross-validation method was used to test the robustness of each original DFA. Results The majority of classification functions were more than 80% successful in classifying speakers into their appropriate group. New metrics that combined successive vocalic and consonantal segments emerged as important predictor variables. DFAs pitting each dysarthria group against the combined others resulted in unique constellations of predictor variables that yielded high levels of classification accuracy. Conclusions: This study confirms the ability of rhythm metrics to distinguish control speech from dysarthrias and to discriminate dysarthria subtypes. Rhythm metrics show promise for use as a rational and objective clinical tool.
This study is the third in a series that has explored the source of intelligibility decrement in dysarthria by jointly considering signal characteristics and the cognitive-perceptual processes employed by listeners. A paradigm of lexical boundary error analysis was used to examine this interface by manipulating listener constraints with a brief familiarization procedure. If familiarization allows listeners to extract relevant segmental and suprasegmental information from dysarthric speech, they should obtain higher intelligibility scores than nonfamiliarized listeners, and their lexical boundary error patterns should approximate those obtained in misperceptions of normal speech. Listeners transcribed phrases produced by speakers with either hypokinetic or ataxic dysarthria after being familiarized with other phrases produced by these speakers. Data were compared to those of nonfamiliarized listeners [Liss et al., J. Acoust. Soc. Am. 107, 3415-3424 (2000)]. The familiarized groups obtained higher intelligibility scores than nonfamiliarized groups, and the effects were greater when the dysarthria type of the familiarization procedure matched the dysarthria type of the transcription task. Remarkably, no differences in lexical boundary error patterns were discovered between the familiarized and nonfamiliarized groups. Transcribers of the ataxic speech appeared to have difficulty distinguishing strong and weak syllables in spite of the familiarization. Results suggest that intelligibility decrements arise from the perceptual challenges posed by the degraded segmental and suprasegmental aspects of the signal, but that this type of familiarization process may differentially facilitate mapping segmental information onto existing phonological categories.
This investigation evaluated a possible source of reduced intelligibility in hypokinetic dysarthric speech, namely the mismatch between listeners' perceptual strategies and the acoustic information available in the dysarthric speech signal. A paradigm of error analysis was adopted in which listener transcriptions of phrases were coded for the presence and type of word boundary errors. Seventy listeners heard 60 phrases produced by speakers with hypokinetic dysarthria. The six-syllable phrases alternated strong and weak syllables and ranged in length from three to five words. Lexical boundary violations were defined as erroneous insertions or deletions of lexical boundaries that occurred either before strong or before weak syllables. A total of 1596 lexical boundary errors in the listeners' transcriptions was identified unanimously by three independent judges. The pattern of errors generally conformed with the predictions of the Metrical Segmentation Strategy hypothesis [Cutler and Norris, J. Exp. Psychol. 14, 113-121 (1988)] which posits that listeners attend to strong syllables to identify word onsets. However, the strength of adherence to this pattern varied across speakers. Comparison of acoustic evidence of syllabic strength to lexical boundary error patterns revealed a source of intelligibility deficit associated with this particular type of dysarthric speech pattern.
Digital health data are multimodal and high-dimensional. A patient’s health state can be characterized by a multitude of signals including medical imaging, clinical variables, genome sequencing, conversations between clinicians and patients, and continuous signals from wearables, among others. This high volume, personalized data stream aggregated over patients’ lives has spurred interest in developing new artificial intelligence (AI) models for higher-precision diagnosis, prognosis, and tracking. While the promise of these algorithms is undeniable, their dissemination and adoption have been slow, owing partially to unpredictable AI model performance once deployed in the real world. We posit that one of the rate-limiting factors in developing algorithms that generalize to real-world scenarios is the very attribute that makes the data exciting—their high-dimensional nature. This paper considers how the large number of features in vast digital health data can challenge the development of robust AI models—a phenomenon known as “the curse of dimensionality” in statistical learning theory. We provide an overview of the curse of dimensionality in the context of digital health, demonstrate how it can negatively impact out-of-sample performance, and highlight important considerations for researchers and algorithm designers.
Purpose Previous research demonstrated the ability of temporally based rhythm metrics to distinguish among dysarthrias with different prosodic deficit profiles (J. M. Liss et al., 2009). The authors examined whether comparable results could be obtained by an automated analysis of speech envelope modulation spectra (EMS), which quantifies the rhythmicity of speech within specified frequency bands. Method EMS was conducted on sentences produced by 43 speakers with 1 of 4 types of dysarthria and healthy controls. The EMS consisted of the spectra of the slow-rate (up to 10 Hz) amplitude modulations of the full signal and 7 octave bands ranging in center frequency from 125 to 8000 Hz. Six variables were calculated for each band relating to peak frequency and amplitude and relative energy above, below, and in the region of 4 Hz. Discriminant function analyses (DFA) determined which sets of predictor variables best discriminated between and among groups. Results Each of 6 DFAs identified 2–6 of the 48 predictor variables. These variables achieved 84%–100% classification accuracy for group membership. Conclusions Dysarthrias can be characterized by quantifiable temporal patterns in acoustic output. Because EMS analysis is automated and requires no editing or linguistic assumptions, it shows promise as a clinical and research tool.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.