2022
DOI: 10.48550/arxiv.2205.06445
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Abstract: Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. It is difficult to collect large quantities of such data for ASR system development due to the mobility issues often found among these users. To this end, data augmentation techniques play a vital role. In contrast to existing data augmentation techniques only modifying the speaking rate or overall shape of spectra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 81 publications
0
3
0
Order By: Relevance
“…A second approach is to decrease the model size [7], or to train an inserted small module instead of finetuning the whole model [8,9], so the number of parameters learned on the dysarthric data is limited. Thirdly and differently from the solutions that work on training strategy or model structure, [10,11,12,13] focus directly on the data and do augmentation to generate more dysarthric speech for use in training.…”
Section: Introductionmentioning
confidence: 99%
“…A second approach is to decrease the model size [7], or to train an inserted small module instead of finetuning the whole model [8,9], so the number of parameters learned on the dysarthric data is limited. Thirdly and differently from the solutions that work on training strategy or model structure, [10,11,12,13] focus directly on the data and do augmentation to generate more dysarthric speech for use in training.…”
Section: Introductionmentioning
confidence: 99%
“…For example, dysarthric speakers of very low speech intelligibility exhibit clearer patterns of articulatory imprecision, decreased volume and clarity, increased dysfluencies, slower speaking rate and changes in pitch [29], while those diagonalized with mid or high speech intelligibility are closer to normal speakers. Such heterogeneity further increases the mismatch against normal speech and the difficulty in both speaker-independent (SI) ASR system development using limited impaired speech data and fine-grained personalization to individual users' data [3,25,30] So far the majority of prior researches to address the dysarthric speaker level diversity have been focused on using speaker-identity only either in speaker-dependent (SD) data augmentation [7,9,13,14,18,27], or in speaker adapted or dependent ASR system development [1, 3, 4, 7, 11-13, 19, 22, 25, 31-33]. In contrast, very limited prior researches have used speech impairment severity information for dysarthric speech recognition.…”
Section: Introductionmentioning
confidence: 99%
“…A set of novel techniques and recipe configurations were proposed to learn both speech impairment severity and speaker-identity when constructing and personalizing these systems. In contrast, prior researches mainly focused on using speaker-identity only in speaker-dependent data augmentation [7,9,13,14,18,27] and speaker adapted or dependent ASR system development [1,3,4,7,11,13,19,22,23,25,[31][32][33]. Very limited prior researches utilized speech impairment severity information [2,11,25,[34][35][36].…”
Section: Introductionmentioning
confidence: 99%