ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413930
|View full text |Cite
|
Sign up to set email alerts
|

Towards Data Selection on TTS Data for Children’s Speech Recognition

Abstract: Although great progress has been made on automatic speech recognition (ASR) systems, children's speech recognition still remains a challenging task. General ASR systems for children's speech suffer from the lack of corpora and mismatch between children's and adults' speech. Efforts have been made to reduce such mismatch by applying normalization methods to generate modified adults' speech for ASR training. However, modified adults' data can reflect the characteristics of children's speech to a very limited ext… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 20 publications
(19 reference statements)
0
3
0
Order By: Relevance
“…Most of these approaches consist of various data augmentation techniques for increasing the amount of usable training data. Text-to-Speech based data augmentations as introduced by [14] and [17], where ASR models are finetuned using synthetic data, have not shown significant increases in the accuracy of child ASR. Generative Adversarial Network (GAN) based augmentation [18], [19], [20] has also been explored to increase the amount of labeled data with acoustic attributes like those of child speech.…”
Section: A Related Workmentioning
confidence: 99%
“…Most of these approaches consist of various data augmentation techniques for increasing the amount of usable training data. Text-to-Speech based data augmentations as introduced by [14] and [17], where ASR models are finetuned using synthetic data, have not shown significant increases in the accuracy of child ASR. Generative Adversarial Network (GAN) based augmentation [18], [19], [20] has also been explored to increase the amount of labeled data with acoustic attributes like those of child speech.…”
Section: A Related Workmentioning
confidence: 99%
“…While data augmentation has been predominantly used to reduce WER in LVSCR, very few researchers adapt data augmentation to handle OOV in ASR [61,62,63]. These studies address issues related to specific words such as proper nouns.…”
Section: Oov Detection and Recoverymentioning
confidence: 99%
“…In [14,15,16], adult speech signals are modified using a cycle consistent generative adversarial networks (GAN) to synthetically generate speech data with acoustic attributes similar to child speakers, and the synthetically generated speech is combined with a training set. Synthetic speech signals generated from children's TTS model were added to ASR training to improve performance on children test cases in [17] and [18]. The stochastic feature map-ping (SFM) technique was also explored to transform out-ofdomain adult data to children's speech data in [19].…”
Section: Related Workmentioning
confidence: 99%