Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1161
|View full text |Cite
|
Sign up to set email alerts
|

Investigation of Data Augmentation Techniques for Disordered Speech Recognition

Abstract: Disordered speech recognition is a highly challenging task. The underlying neuro-motor conditions of people with speech disorders, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of speech required for system development. This paper investigates a set of data augmentation techniques for disordered speech recognition, including vocal tract length perturbation (VTLP), tempo perturbation and speed perturbation. Both normal and disordered speech were … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
44
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 34 publications
(52 citation statements)
references
References 37 publications
0
44
0
Order By: Relevance
“…In order to expand the limited training data of only 15.74 hours after re-segmentation, following the previous research on data augmentation for normal speech [43] and disordered speech [22], speed First, speaker independent speed perturbation with a fixed perturbation factor set {0.9, 1.0, 1.1} was used to expand the participants' speech data by a factor of 3 to about 29 hours. Second, the investigators' speech were further speed perturbed with a different factor set {0.84, 0.95, 1.0, 1.08, 1.27}.…”
Section: Data Augmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to expand the limited training data of only 15.74 hours after re-segmentation, following the previous research on data augmentation for normal speech [43] and disordered speech [22], speed First, speaker independent speed perturbation with a fixed perturbation factor set {0.9, 1.0, 1.1} was used to expand the participants' speech data by a factor of 3 to about 29 hours. Second, the investigators' speech were further speed perturbed with a different factor set {0.84, 0.95, 1.0, 1.08, 1.27}.…”
Section: Data Augmentationmentioning
confidence: 99%
“…Speech segmentation extracted from the original transcripts was first refined by removing excessive silence from each utterance. Speed perturbation based data augmentation methods [22] were used to expand the limited elderly training data by a factor of 4 times. State-of-theart hybrid DNN-HMM systems featuring lattice-free maximum mutual information (LF-MMI) criterion [23] based sequence discrim- [16] and the speaker dependent LHUC transforms [26] were further exploited.…”
Section: Introductionmentioning
confidence: 99%
“…Due to the associate difficulties in controlling the muscles and articulators used in speech production [19], abnormalities including articulation imprecision, reduced intensity and clarity, slower speaking rates and increased disfluencies are observed in disordered speech [20]. Furthermore, temporal or spectral perturbation based data augmentation techniques widely used in both state-of-the-art ASR systems for normal speech [21][22][23][24] and recently those designed for impaired speech [25][26][27] introduce extra diversity. To this end, speaker adaptation techniques play a crucial role in current ASR systems for both normal and disordered speech.…”
Section: Introductionmentioning
confidence: 99%
“…Dysarthric speaker adaptation of recurrent neural network transducers (RNN-Ts) [4] and lattice-free MMI trained time delay neural networks (TDNNs) [5] via direct model parameter fine-tuning were studied in [38,39]. Learning hidden unit contributions based (LHUC) SAT [33] was investigated in [13,27]. The majority of prior researches on disordered speech adaptation focused on feature transformation and model based adaptation.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation