Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1318
|View full text |Cite
|
Sign up to set email alerts
|

Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

Abstract: Dysarthria is a motor speech disorder, resulting in mumbled, slurred or slow speech that is generally difficult to understand by both humans and machines. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks. In this paper, we propose the use of deep autoencoders to enhance the Mel Frequency Cepstral Coefficients (MFCC) based features in order to improve dysarthric speech recognition. Speech from healthy control speakers is used to train an autoencoder which is i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 26 publications
(20 citation statements)
references
References 23 publications
0
20
0
Order By: Relevance
“…All evaluations were carried out on Universal Access dysarthric speech corpus computer command words. An absolute improvement of 15% was achieved by using fMLLR transform as compared to our previous work [6]. Additionally, ASR peformance improved by 4% using silence pre-processing.…”
Section: Discussionmentioning
confidence: 69%
See 1 more Smart Citation
“…All evaluations were carried out on Universal Access dysarthric speech corpus computer command words. An absolute improvement of 15% was achieved by using fMLLR transform as compared to our previous work [6]. Additionally, ASR peformance improved by 4% using silence pre-processing.…”
Section: Discussionmentioning
confidence: 69%
“…While traditional, off-theshelf Automatic Speech Recognition (ASR) systems perform well for normal speech, this is not the case with the atypical dysarthric speech owing to the inter-speaker and intra-speaker inconsistencies in the acoustic space as well as the sparseness of data. Several techniques are employed to improve ASR performance for dysarthric speech: acoustic space enhancement, feature engineering, Deep Neural Networks (DNN), speaker adaptation, lexical model adaptation-individually or as a combination thereof [2,3,4,5,6]. In order to exploit the machine learning techniques for ASR fully, suitable data to build these systems is imperative.…”
Section: Introductionmentioning
confidence: 99%
“…Our experimental results demonstrated that CLSTM-RNN has the potential to improve the ASR performance as a speaker-independent acoustic model for the patients with ALS. To further improve the ASR accuracies, techniques for session/speaker variability compensation including acoustic feature transformation [25,26], acoustic model adaptation [27], and pronunciation variation modeling [27,28] can be further applied. We speculate that the results may improve once a larger training dataset from more ALS patients is obtained.…”
Section: Discussionmentioning
confidence: 99%
“…Dysarthric speech was recognized using the same configuration of DNN-HMM as in our previous work [21]. A maximum likelihood estimation (MLE) training approach with 100 senones and 8 Gaussian mixtures was adopted.…”
Section: Dnn-hmm Based Asrmentioning
confidence: 99%
“…Evaluation of our work is carried out on Universal Access Dysarthric Speech corpus [20]. In our earlier work [21], we had used a Deep Autoencoder to enhance dysarthric test speech features, wherein the DAE was trained using only healthy control speech. This is different from our current work in the DAE configuration and the training protocol followed.…”
Section: Introductionmentioning
confidence: 99%