Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1241
|View full text |Cite
|
Sign up to set email alerts
|

Age-Invariant Training for End-to-End Child Speech Recognition Using Adversarial Multi-Task Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…However, our previous work [16] found that the FHVAE does not separate the dysarthria and content information and speech impairment is identifiable from z i,n 1 . To obtain dysarthria-invariant features z i,n 2 , inspired by [17], we introduce adversarial training into the FHVAE model. The data flow of the proposed model is provided in figure 1.…”
Section: Fhvae With Adversarial Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…However, our previous work [16] found that the FHVAE does not separate the dysarthria and content information and speech impairment is identifiable from z i,n 1 . To obtain dysarthria-invariant features z i,n 2 , inspired by [17], we introduce adversarial training into the FHVAE model. The data flow of the proposed model is provided in figure 1.…”
Section: Fhvae With Adversarial Trainingmentioning
confidence: 99%
“…To achieve a greater deal of dysarthria-invariance in the content variable, instead of using forced regularization with phoneme alignment [16], inspired by [17], we introduce adversarial training [18] to the FHVAE model. We feed the FH-VAE model with both control and dysarthric speech, regarding the content variable encoder as the generator.…”
Section: Introductionmentioning
confidence: 99%
“…Prior research from the Speech Technology community on Children ASR (Stemmer et al, 2003;Shivakumar et al, 2014;Tong et al, 2017;Wu et al, 2019;Shivakumar and Georgiou, 2020;Yeung et al, 2021;Rumberg et al, 2021;Gretter et al, 2021) is captivating. But these focused on: (i) older children, including kindergarten (6-15 yrs), (ii) data collected using head-mounted microphones or close-proximity handheld smartphones in clean/controlled settings under adult supervision, and (iii) with just one speaker using prompts or read stimuli, and limited spontaneous (not scripted) speech.…”
Section: Challenges Of Developing Automaticmentioning
confidence: 99%
“…An alternative to fine-tuning is multitask learning, where the model is trained on both domains simultaneously. Tong et al [33] explored multitask learning for child and adult speech recognition, and Rumberg et al [50] constrained the model to learn features that are independent of speaker age, leading to better transfer between the domains. Other work augmented adult speech by making it more similar to child speech; this involved simulating phenomena such as vowel prolongation that are typically associated with speech produced by children [51].…”
Section: Transcription (3)mentioning
confidence: 99%