Model adaptation and adaptive training for the recognition of dysarthric speech

Sehgal, Siddharth; Cunningham, Stuart

doi:10.18653/v1/w15-5112

Cited by 38 publications

(30 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Between the development of DM-NSR and Speech Vision, there have been few other attempts to design dysarthric-specific ASR. The first attempt is [12], in which a whole-word speaker adaptive dysarthric ASR was designed and evaluated on UA-Speech speakers with a vocabulary size of 155 words. Based on whether an ASR system is open-set or closed-set speaker, ASR tasks are categorized into three categories.…”

Section: Related Workmentioning

confidence: 99%

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System

Shahamiri

2021

IEEE Trans. Neural Syst. Rehabil. Eng.

View full text Add to dashboard Cite

Dysarthria is a disorder that affects an individual's speech intelligibility due to the paralysis of muscles and organs involved in the articulation process. As the condition is often associated with physically debilitating disabilities, not only do such individuals face communication problems, but also interactions with digital devices can become a burden. For these individuals, automatic speech recognition (ASR) technologies can make a significant difference in their lives as computing and portable digital devices can become an interaction medium, enabling them to communicate with others and computers. However, ASR technologies have performed poorly in recognizing dysarthric speech, especially for severe dysarthria, due to multiple challenges facing dysarthric ASR systems. We identified these challenges are due to the alternation and inaccuracy of dysarthric phonemes, the scarcity of dysarthric speech data, and the phoneme labeling imprecision. This paper reports on our second dysarthric-specific ASR system, called Speech Vision (SV) that tackles these challenges by adopting a novel approach towards dysarthric ASR in which speech features are extracted visually, then SV learns to see the shape of the words pronounced by dysarthric individuals. This visual acoustic modeling feature of SV eliminates phonemerelated challenges. To address the data scarcity problem, SV adopts visual data augmentation techniques, generates synthetic dysarthric acoustic visuals, and leverages transfer learning. Benchmarking with other state-of-the-art dysarthric ASR considered in this study, SV outperformed them by improving recognition accuracies for 67% of UA-Speech speakers, where the biggest improvements were achieved for severe dysarthria.

show abstract

Section: Related Workmentioning

confidence: 99%

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System

Shahamiri

2021

IEEE Trans. Neural Syst. Rehabil. Eng.

View full text Add to dashboard Cite

show abstract

“…Speaker selection and speaker adaptation techniques have been employed to improve ASR performance for dysarthric speech in [11,12]. ASR configurations have been designed and optimized using dysarthria severity level cues in [13,14,15].…”

Section: Introductionmentioning

confidence: 99%

Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder

et al. 2018

View full text Add to dashboard Cite

Dysarthria is a manisfestation of the disruption in the neuromuscular physiology resulting in uneven, slow, slurred, harsh or quiet speech. Dysarthric speech poses serious challenges to automatic speech recognition, considering this speech is difficult to decipher for both humans and machines. The objective of this work is to enhance dysarthric speech features to match that of healthy control speech. We use a Time-Delay Neural Network based Denoising Autoencoder (TDNN-DAE) to enhance the dysarthric speech features. The dysarthric speech thus enhanced is recognized using a DNN-HMM based Automatic Speech Recognition (ASR) engine. This methodology was evaluated for speaker-independent (SI) and speaker-adapted (SA) systems. Absolute improvements of 13% and 3% was observed in the ASR performance for SI and SA systems respectively as compared with unenhanced dysarthric speech recognition.

show abstract

“…In [6], a set of MFCC features, that best represent dysarthric acoustic features was selected to be used in Artificial Neural Network (ANN)-based ASR. A hybrid adaptation using maximum likelihood linear regression (MLLR) and MAP [7] have been used to improve dysarthric speech recognition. Voice parameters such as jitter and shimmer features along with a multi-taper spectral estimation have been used along with feature space maximum likelihood linear regression (fMLLR) transformation and speaker adaptation to obtain improved dysarthric speech recognition [8].…”

Section: Introductionmentioning

confidence: 99%

“…Traditionally, speech intelligibility has been an indicator of severity of the speech disorder [13]. An understanding of severity has contributed to improved speech recognition of dysarthric speech as seen in [7,14,15].…”

Section: Introductionmentioning

confidence: 99%

Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

et al. 2017

View full text Add to dashboard Cite

Dysarthria is a motor speech disorder, resulting in mumbled, slurred or slow speech that is generally difficult to understand by both humans and machines. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks. In this paper, we propose the use of deep autoencoders to enhance the Mel Frequency Cepstral Coefficients (MFCC) based features in order to improve dysarthric speech recognition. Speech from healthy control speakers is used to train an autoencoder which is in turn used to obtain improved feature representation for dysarthric speech. Additionally, we analyze the use of severity based tempo adaptation followed by autoencoder based speech feature enhancement. All evaluations were carried out on Universal Access dysarthric speech corpus. An overall absolute improvement of 16% was achieved using tempo adaptation followed by autoencoder based speech front end representation for DNN-HMM based dysarthric speech recognition.

show abstract

Model adaptation and adaptive training for the recognition of dysarthric speech

Cited by 38 publications

References 23 publications

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System

Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System

Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder

Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

Contact Info

Product

Resources

About