Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

Vachhani, Bhavik; Bhat, Chitralekha; Das, Biswajit; Kopparapu, Sunil Kumar

doi:10.21437/interspeech.2017-1318

Cited by 26 publications

(20 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All evaluations were carried out on Universal Access dysarthric speech corpus computer command words. An absolute improvement of 15% was achieved by using fMLLR transform as compared to our previous work [6]. Additionally, ASR peformance improved by 4% using silence pre-processing.…”

Section: Discussionmentioning

confidence: 69%

“…While traditional, off-theshelf Automatic Speech Recognition (ASR) systems perform well for normal speech, this is not the case with the atypical dysarthric speech owing to the inter-speaker and intra-speaker inconsistencies in the acoustic space as well as the sparseness of data. Several techniques are employed to improve ASR performance for dysarthric speech: acoustic space enhancement, feature engineering, Deep Neural Networks (DNN), speaker adaptation, lexical model adaptation-individually or as a combination thereof [2,3,4,5,6]. In order to exploit the machine learning techniques for ASR fully, suitable data to build these systems is imperative.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition

Vachhani¹,

Bhat²,

Kopparapu³

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

Dysarthria refers to a speech disorder caused by trauma to the brain areas concerned with motor aspects of speech giving rise to effortful, slow, slurred or prosodically abnormal speech. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks, owing mostly to insufficient dysarthric speech data. Speaker related challenges complicates data collection process for dysarthric speech. In this paper, we explore data augmentation using temporal and speed modifications to healthy speech to simulate dysarthric speech. DNN-HMM based Automatic Speech Recognition (ASR) and Random Forest based classification were used for evaluation of the proposed method. Dysarthric speech, generated synthetically, is classified for severity level using a Random Forest classifier that is trained on actual dysarthric speech. ASR trained on healthy speech, augmented with simulated dysarthric speech is evaluated for dysarthric speech recognition. All evaluations were carried out using Universal Access dysarthric speech corpus. An absolute improvement of 4.24% and 2% WAS achieved using tempo based and speed based data augmentation respectively as compared to ASR performance using healthy speech alone for training.

show abstract

Section: Discussionmentioning

confidence: 69%

Section: Introductionmentioning

confidence: 99%

Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition

Vachhani¹,

Bhat²,

Kopparapu³

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our experimental results demonstrated that CLSTM-RNN has the potential to improve the ASR performance as a speaker-independent acoustic model for the patients with ALS. To further improve the ASR accuracies, techniques for session/speaker variability compensation including acoustic feature transformation [25,26], acoustic model adaptation [27], and pronunciation variation modeling [27,28] can be further applied. We speculate that the results may improve once a larger training dataset from more ALS patients is obtained.…”

Section: Discussionmentioning

confidence: 99%

Dysarthric Speech Recognition Using Convolutional LSTM Neural Network

Kim

Cao

An³

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

Dysarthria is a motor speech disorder that impedes the physical production of speech. Speech in patients with dysarthria is generally characterized by poor articulation, breathy voice, and monotonic intonation. Therefore, modeling the spectral and temporal characteristics of dysarthric speech is critical for better performance in dysarthric speech recognition. Convolutional long short-term memory recurrent neural networks (CLSTM-RNNs) have recently successfully been used in normal speech recognition, but have rarely been used in dysarthric speech recognition. We hypothesized CLSTM-RNNs have the potential to capture the distinct characteristics of dysarthric speech, taking advantage of convolutional neural networks (CNNs) for extracting effective local features and LSTM-RNNs for modeling temporal dependencies of the features. In this paper, we investigate the use of CLSTM-RNNs for dysarthric speech recognition. Experimental evaluation on a database collected from nine dysarthric patients showed that our approach provides substantial improvement over both standard CNN and LSTM-RNN based speech recognizers.

show abstract

“…Dysarthric speech was recognized using the same configuration of DNN-HMM as in our previous work [21]. A maximum likelihood estimation (MLE) training approach with 100 senones and 8 Gaussian mixtures was adopted.…”

Section: Dnn-hmm Based Asrmentioning

confidence: 99%

“…Evaluation of our work is carried out on Universal Access Dysarthric Speech corpus [20]. In our earlier work [21], we had used a Deep Autoencoder to enhance dysarthric test speech features, wherein the DAE was trained using only healthy control speech. This is different from our current work in the DAE configuration and the training protocol followed.…”

Section: Introductionmentioning

confidence: 99%

Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder

et al. 2018

Self Cite

View full text Add to dashboard Cite

Dysarthria is a manisfestation of the disruption in the neuromuscular physiology resulting in uneven, slow, slurred, harsh or quiet speech. Dysarthric speech poses serious challenges to automatic speech recognition, considering this speech is difficult to decipher for both humans and machines. The objective of this work is to enhance dysarthric speech features to match that of healthy control speech. We use a Time-Delay Neural Network based Denoising Autoencoder (TDNN-DAE) to enhance the dysarthric speech features. The dysarthric speech thus enhanced is recognized using a DNN-HMM based Automatic Speech Recognition (ASR) engine. This methodology was evaluated for speaker-independent (SI) and speaker-adapted (SA) systems. Absolute improvements of 13% and 3% was observed in the ASR performance for SI and SA systems respectively as compared with unenhanced dysarthric speech recognition.

show abstract

Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

Cited by 26 publications

References 23 publications

Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition

Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition

Dysarthric Speech Recognition Using Convolutional LSTM Neural Network

Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder

Contact Info

Product

Resources

About