Weak-Supervised Dysarthria-Invariant Features for Spoken Language Understanding Using an Fhvae and Adversarial Training

Qi, Jinzi; hamme, Hugo Van

doi:10.1109/slt54892.2023.10023085

Cited by 1 publication

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Studies that emphasized speech patterns [23,27,[36][37][38][39][40][41][42] were interested in the formation of words spoken, omissions in their patterns, and inclusion of interesting vocabulary during discourse. These studies were much aligned toward word representation and drawing meaning out of the same by leveraging other speech-independent features, such as the speakers' emotions.…”

Section: Mode Of Meaning Extraction Usedmentioning

confidence: 99%

“…The studies that used vector encoding [15,[38][39][40][41][42][43]50,51] used NLU-based models, such long short-term memory neural networks or combinations of gated recurrent unit and convolutional neural networks to achieve the tasks of dialogue assessment in dysarthric speech, language understanding, and semantic pattern tracking.…”

Section: Nature Of Speech Representations Usedmentioning

confidence: 99%

“…Extensive and well-documented databases, such as the TORGO database and the UA-Speech database, contain data sets that have the potential to yield the in-depth speech patterns necessary for speech comprehension. Studies that used these 2 databases [26,31] (n=2; 6% of the studies) or a hybrid of any other databases (n=11; 37% of the studies) [28,33,[37][38][39][40][41][47][48][49]51] focused more on the application of the data in their proposed models, with little effort going into curation and preprocessing of the data.…”

Section: Databases Usedmentioning

confidence: 99%

See 2 more Smart Citations

Models and Approaches for Comprehension of Dysarthric Speech Using Natural Language Processing: Systematic Review

Alaka¹,

Kasamani²

2023

JMIR Rehabil Assist Technol

View full text Add to dashboard Cite

Background Speech intelligibility and speech comprehension for dysarthric speech has attracted much attention recently. Dysarthria is characterized by irregularities in the speed, strength, pitch, breath control, range, steadiness, and accuracy of muscle movements required for articulatory aspects of speech production. Objective This study examined the contributions made by other studies involved in dysarthric speech comprehension. We focused on the modes of meaning extraction used in generalizing speaker-listener underpinnings in light of semantic ontology extraction as a desired technique, applied method types, speech representations used, and databases sourced from. Methods This study involved a systematic literature review using 7 electronic databases: Cochrane Database of Systematic Reviews, Web of Science Core Collection, Scopus, PubMed, ACM, IEEE Xplore, and Google Scholar. The main eligibility criterion was the extraction of meaning from dysarthric speech using natural language processing or understanding approaches to improve on dysarthric speech comprehension. In total, out of 834 search results, 30 studies that matched the eligibility requirements were acquired following screening by 2 independent reviewers, with a lack of consensus being resolved through joint discussion or consultation with a third party. In order to evaluate the studies’ methodological quality, the risk of bias assessment was based on the Cochrane risk-of-bias tool version 2 (RoB2) with 23 of the studies (77%) registering low risk of bias and 7 studies (33%) raising some concern over the risk of bias. The overall quality assessment of the study was done using TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis). Results Following a review of 30 primary studies, this study revealed that the reviewed studies focused on natural language understanding or clinical approaches, with an increase in proposed solutions from 2020 onwards. Most studies relied on speaker-dependent speech features, while others used speech patterns, semantic knowledge, or hybrid approaches. The prevalent use of vector representation aligned with natural language understanding models, while Mel-frequency cepstral coefficient representation and no representation approaches were applied in neural networks. Hybrid representation studies aimed to reconstruct dysarthric speech or improve comprehension. Comprehensive databases, like TORGO and UA-Speech, were commonly used in combination with other curated databases, while primary data was preferred for specific or unique research objectives. Conclusions We found significant gaps in dysarthric speech comprehension characterized by the lack of inclusion of important listener or speech-independent features in the speech representations, mode of extraction, and data sources used. Further research is therefore proposed regarding the formulation of models that accommodate listener and speech-independent features through semantic ontologies that will be useful in the inclusion of key features of listener and speech-independent features for meaning extraction of dysarthric speech.

show abstract