Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts

Perez, Matthew; Aldeneh, Zakaria; Provost, Emily Mower

doi:10.21437/interspeech.2020-2049

Cited by 6 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The differences between the performance in the different groups that establish the degree of the aphasia severity are quite significant, obtaining up to 2x error on the most severe groups when comparing with mild cases. These big differences between AQ level groups are in line with previous publications [23,38,39], which PER and WER results are summarized in Table 4.…”

Section: Evaluation Results and Discussionsupporting

confidence: 91%

“…Hence, a fair and balanced comparison between systems and technological approaches cannot always be guaranteed. Nonetheless, in some cases, notable improvements can be appreciated between the 52.3 of PER in moderate aphasia test group presented in [25] and the more recent 41.7 of PER reported in [39]. These results seems to be in line with the 38.3 global Syllable Error Rate (SER) reported for the full test set in Cantonese [40], where more than 60% of the test set was composed of mild severity speech data.…”

Section: Related Work In Aphasic Speech Recognitionsupporting

confidence: 75%

“…However, the biggest challenge in the field nowadays is to improve the performance of the continuous recognition of aphasic speech in large vocabularies. To the best of our knowledge, the published works in the task of aphasic continuous speech recognition of large vocabularies only consider English [23,38,39] and Cantonese [40] to date. In this sense, the performance and results for these systems widely oscillate depending on the severity level of aphasia, ranging WER from 33 on mildest cases to more than 60 on very severe cases.…”

Section: Related Work In Aphasic Speech Recognitionmentioning

confidence: 99%

“…Regarding technological approaches, previous works focused on developing ASR technology for aphasic speech considering architectures based on hybrid Acoustic Models (AMs) such as Deep Neural Networks and Hidden Markov Models (DNN-HMM) [25], Bidirectional Long Short-Term Memory and Recurrent Neural Models (BLSTM-RNN) [23], and solutions based on Mixture of Experts (MoEs) [39]. More specifically, in the work presented in [38], the authors established the first large-vocabulary continuous speech recognition baseline for English built on the AphasiaBank dataset using a DNN-HMM hybrid AM trained on unseen train-validation-test partitions and by distinguishing performances depending on aphasia severity.…”

Section: Related Work In Aphasic Speech Recognitionmentioning

confidence: 99%

“…In this case, the training of the AM was reinforced with transcribed data from healthy speakers, achieving an improved WER ranging from 33.68 on mild test set to 53.17 on very severe test set. In the work described in [39], an AM based on a MoE of DNN models was proposed, where each expert in the model was specialized on specific aphasia severity. Additionally, an Speech Intelligibility Detector (SID) composed of two hidden layers and a final softmax function was trained to detect the Aphasia Quotient (AQ) severity level of a given speech frame by using the acoustic features and utterance-level speaker embeddings.…”

Section: Related Work In Aphasic Speech Recognitionmentioning

confidence: 99%

See 4 more Smart Citations

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

2021

View full text Add to dashboard Cite

Automatic speech recognition in patients with aphasia is a challenging task for which studies have been published in a few languages. Reasonably, the systems reported in the literature within this field show significantly lower performance than those focused on transcribing non-pathological clean speech. It is mainly due to the difficulty of recognizing a more unintelligible voice, as well as due to the scarcity of annotated aphasic data. This work is mainly focused on applying novel semi-supervised learning methods to the AphasiaBank dataset in order to deal with these two major issues, reporting improvements for the English language and providing the first benchmark for the Spanish language for which less than one hour of transcribed aphasic speech was used for training. In addition, the influence of reinforcing the training and decoding processes with out-of-domain acoustic and text data is described by using different strategies and configurations to fine-tune the hyperparameters and the final recognition systems. The interesting results obtained encourage extending this technological approach to other languages and scenarios where the scarcity of annotated data to train recognition models is a challenging reality.

show abstract

Section: Evaluation Results and Discussionsupporting

confidence: 91%

Section: Related Work In Aphasic Speech Recognitionsupporting

confidence: 75%

Section: Related Work In Aphasic Speech Recognitionmentioning

confidence: 99%

Section: Related Work In Aphasic Speech Recognitionmentioning

confidence: 99%

Section: Related Work In Aphasic Speech Recognitionmentioning

confidence: 99%

See 3 more Smart Citations

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

2021

View full text Add to dashboard Cite

show abstract

Automating Intended Target Identification for Paraphasias in Discourse Using a Large Language Model

Salem,

Gale,

Fleegle

et al. 2023

J Speech Lang Hear Res

View full text Add to dashboard Cite

Purpose: To date, there are no automated tools for the identification and fine-grained classification of paraphasias within discourse, the production of which is the hallmark characteristic of most people with aphasia (PWA). In this work, we fine-tune a large language model (LLM) to automatically predict paraphasia targets in Cinderella story retellings. Method: Data consisted of 332 Cinderella story retellings containing 2,489 paraphasias from PWA, for which research assistants identified their intended targets. We supplemented these training data with 256 sessions from control participants, to which we added 2,415 synthetic paraphasias. We conducted four experiments using different training data configurations to fine-tune the LLM to automatically “fill in the blank” of the paraphasia with a predicted target, given the context of the rest of the story retelling. We tested the experiments' predictions against our human-identified targets and stratified our results by ambiguity of the targets and clinical factors. Results: The model trained on controls and PWA achieved 50.7% accuracy at exactly matching the human-identified target. Fine-tuning on PWA data, with or without controls, led to comparable performance. The model performed better on targets with less human ambiguity and on paraphasias from participants with fluent or less severe aphasia. Conclusions: We were able to automatically identify the intended target of paraphasias in discourse using just the surrounding language about half of the time. These findings take us a step closer to automatic aphasic discourse analysis. In future work, we will incorporate phonological information from the paraphasia to further improve predictive utility. Supplemental Material: https://doi.org/10.23641/asha.24463543

show abstract

Towards Noise Robust Speech Emotion Recognition Using Dynamic Layer Customization

Wilf

Provost

2021

2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII)

View full text Add to dashboard Cite

Aphasic Speech Recognition Using a Mixture of Speech Intelligibility Experts

Cited by 6 publications

References 26 publications

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

Automating Intended Target Identification for Paraphasias in Discourse Using a Large Language Model

Towards Noise Robust Speech Emotion Recognition Using Dynamic Layer Customization

Contact Info

Product

Resources

About