Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1754
|View full text |Cite
|
Sign up to set email alerts
|

Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder

Abstract: Dysarthria is a manisfestation of the disruption in the neuromuscular physiology resulting in uneven, slow, slurred, harsh or quiet speech. Dysarthric speech poses serious challenges to automatic speech recognition, considering this speech is difficult to decipher for both humans and machines. The objective of this work is to enhance dysarthric speech features to match that of healthy control speech. We use a Time-Delay Neural Network based Denoising Autoencoder (TDNN-DAE) to enhance the dysarthric speech feat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(13 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…We used the hyperparameters and provided Kaldi recipe of España-Bonet and Fonollosa [4]. 1 The code for our experiments is publicly available. 2 We also chose to model phones independent of their position in words as suggested by Joy and Umesh [7] because of data sparsity and because the lower speaking rates lead to reduced coarticulation effects.…”
Section: Hmm/gmmmentioning
confidence: 99%
See 1 more Smart Citation
“…We used the hyperparameters and provided Kaldi recipe of España-Bonet and Fonollosa [4]. 1 The code for our experiments is publicly available. 2 We also chose to model phones independent of their position in words as suggested by Joy and Umesh [7] because of data sparsity and because the lower speaking rates lead to reduced coarticulation effects.…”
Section: Hmm/gmmmentioning
confidence: 99%
“…typical speech [2,10,13,23]. Other works investigated transforming pathological speech to be more similar to typical speech, for example with speech enhancement methods [1] or by adjusting speech tempo [22]. Alternatively, Jiao et al [6], Xiong et al [22] have also employed data augmentation techniques to create additional, artifical dysarthric speech data.…”
Section: Introductionmentioning
confidence: 99%
“…However, the difference in the durations between dysarthric utterances and healthy control utterances is too high to achieve meaningful frame lengths and frame shifts. Some work [1,13] also resort to tempo adaptation to match the durations, which, however, relies heavily on manual design and is sensitive to the adaption parameters. Different from previous studies, we propose to adaptively align the reconstructed Mel spectrogram sequences and the target one via an attention-based seq2seq decoder, which resembles the Tacotron model [17] for speech synthesis.…”
Section: Speech Feature Reconstructionmentioning
confidence: 99%
“…Dysarthria is a manifestation of the impairment in neuromuscular physiology, resulting in pronunciation errors that include deletions, substitutions, insertions, and distortions of phonemes [1]. Despite the fact that automatic speech recognition (ASR) has made considerable progress with the advent of deep learning approaches [2], it still poses great challenges in building stable dysarthric speech recognition (DSR) systems as it differs in many aspects from typical speech, such as speaking rate and pronunciation.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation