Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-10896
|View full text |Cite
|
Sign up to set email alerts
|

Speaker adaptation for Wav2vec2 based dysarthric ASR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(14 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…In contrast, limited previous speaker adaptation research has been conducted for E2E ASR systems. Among these, auxiliary speaker adaptive features based on i-vector [21], [23], x-vector [23], [60] and f-MLLR [60], or extracted from sequence summary network [61], attention-based speaker memory [62] and speaker-aware modules [63], [64] are incorporated into attention-based encoder-decoder, or conventional, non-convolution augmented Transformer models. Several model-based speaker adaptation methods for E2E models utilize speaker adaptive neural network internal components, for example, SD neural beamforming, encoder, attention or decoder modules in multichannel E2E systems [65], while keeping the parameters of other components fixed as speaker independent during adaptation.…”
Section: B Speaker Adaptation For E2e Asr Systemsmentioning
confidence: 99%
See 2 more Smart Citations
“…In contrast, limited previous speaker adaptation research has been conducted for E2E ASR systems. Among these, auxiliary speaker adaptive features based on i-vector [21], [23], x-vector [23], [60] and f-MLLR [60], or extracted from sequence summary network [61], attention-based speaker memory [62] and speaker-aware modules [63], [64] are incorporated into attention-based encoder-decoder, or conventional, non-convolution augmented Transformer models. Several model-based speaker adaptation methods for E2E models utilize speaker adaptive neural network internal components, for example, SD neural beamforming, encoder, attention or decoder modules in multichannel E2E systems [65], while keeping the parameters of other components fixed as speaker independent during adaptation.…”
Section: B Speaker Adaptation For E2e Asr Systemsmentioning
confidence: 99%
“…In contrast, prior research works on structured transform-based speaker adaptation were limited to the conventional hybrid DNN-HMM based ASR systems [36]- [49]. In contrast, existing speaker adaptation methods designed for E2E ASR systems largely focused on using auxiliary speaker adaptive features [21], [23], [60]- [64], or directly adapting the whole or certain components of the SI models [22], [65]- [67]. The speaker-level data sparsity issue encountered in these dataintensive model-based adaptation approaches remains largely unaddressed.…”
Section: B Speaker Adaptation For E2e Asr Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent efforts on dysarthric speech recognition focused on personalized or tuned ASR models (e.g., [4,5,6,7,8,9,10]), which leverage large proprietary or non-commercial datasets of atypical speech (e.g., Project Euphonia [1]; UASpeech [11]; AphasiaBank [12]). In this work, we take a more pragmatic approach that does not require vast quantities of data, yet enables people with severe speech differences to train phrase recognition models for applications where only a constrained set of phrases is needed.…”
Section: Introductionmentioning
confidence: 99%
“…A set of novel techniques and recipe configurations were proposed to learn both speech impairment severity and speaker-identity when constructing and personalizing these systems. In contrast, prior researches mainly focused on using speaker-identity only in speaker-dependent data augmentation [7,9,13,14,18,27] and speaker adapted or dependent ASR system development [1,3,4,7,11,13,19,22,23,25,[31][32][33]. Very limited prior researches utilized speech impairment severity information [2,11,25,[34][35][36].…”
Section: Introductionmentioning
confidence: 99%