The majority of patients with neurological impairment like Parkinson's Disease (PD) or stroke are affected by dysarthria. Dysarthria is a motor speech impairment which negatively affects speech dimensions such as articulation and loudness. This leads to reduced intelligibility, often hindering daily life communication. Intensive and prolonged speech training can increase patients’ speech intelligibility. Unfortunately, interventions by speech therapists are generally provided only for a short period of time, while continuing practice is needed to maintain or improve intelligibility. eHealth applications might provide a solution. In our research, we explored whether it is possible to develop a game that is suitable for providing speech training in elderly patients with dysarthria due to PD or stroke. In the game, we developed, called Treasure Hunters, two players interact verbally to find the way to the treasure, while receiving automatic feedback on voice loudness and pitch. Participants played with our game in several sessions and generally appreciated it, hinting at our game's potential for speech training in elderly patients. In a within‐subjects experiment with five dysarthric patients, our game was compared to a non‐game computer‐based speech training system: e‐learning‐based Speech Therapy (EST). We focussed on three variables: speech intelligibility, user satisfaction and user preference. Substantial variability between participants was observed, in the outcomes of these three variables and their relations. We conclude that ”one size that fits all” does not apply to computer‐based speech training, but a personalised approach is needed.
Incorporating automatic speech recognition (ASR) in individualized speech training applications is becoming more viable thanks to the improved generalization capabilities of neural network-based acoustic models. The main problem in developing applications for dysarthric speech is the relative in-domain data scarcity. Collecting representative amounts of dysarthric speech data is difficult due to rigorous ethical and medical permission requirements, problems in accessing patients who are generally vulnerable and often subject to altering health conditions and, last but not least, the high variability in speech resulting from different pathological conditions. Developing such applications is even more challenging for languages which in general have fewer resources, fewer speakers and, consequently, also fewer patients than English, as in the case of a mid-sized language like Dutch. In this paper, we investigate a multi-stage deep neural network (DNN) training scheme aimed at obtaining better modeling of dysarthric speech by using only a small amount of in-domain training data. The results show that the system employing the proposed training scheme considerably improves the recognition of Dutch dysarthric speech compared to a baseline system with single-stage training only on a large amount of normal speech or a small amount of in-domain data.
Measuring the intelligibility of disordered speech is a common practice in both clinical and research contexts. Over the years various methods have been proposed and studied, including methods relying on subjective ratings by human judges, and objective methods based on speech technology. Many of these methods measure speech intelligibility at the speaker or utterance level. While this may be satisfactory for some purposes, more detailed evaluations might be required in other cases such as diagnosis and measuring or comparing the outcomes of different types of therapy (by humans or computer programs). In the current paper we investigate intelligibility ratings at three different levels of granularity: utterance, word, and subword level. In a web experiment 50 speech fragments produced by seven dysarthric speakers were rated by 36 listeners in three ways: a score per utterance on a Visual Analogue and a Likert scale, and an orthographic transcription. The latter was used to obtain word and subword (grapheme and phoneme) level ratings using automatic alignment and conversion methods. The implemented phoneme scoring method proved feasible, reliable, and provided a more sensitive and informative measure of intelligibility. Possible implications for clinical practice and research are discussed.
Measuring the intelligibility of disordered speech is a common practice in both clinical and research contexts. Over the years various methods have been proposed and studied, including methods relying on subjective ratings by human judges, and objective methods based on speech technology. Many of these methods measure speech intelligibility at the speaker or utterance level. While this may be satisfactory for some purposes, more detailed evaluations might be required in other cases such as diagnosis and measuring or comparing the outcomes of different types of therapy (by humans or computer programs). In the current paper we investigate intelligibility ratings at three different levels of granularity: utterance, word, and subword level. In a web experiment 50 speech fragments produced by seven dysarthric speakers were rated by 36 listeners in three ways: a score per utterance on a Visual Analogue and a Likert scale, and an orthographic transcription. The latter was used to obtain word and subword (grapheme and phoneme) level ratings using automatic alignment and conversion methods. The implemented phoneme scoring method proved feasible, reliable, and provided a more sensitive and informative measure of intelligibility. Possible implications for clinical practice and research are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.