2017
DOI: 10.1109/lsp.2017.2756347
|View full text |Cite
|
Sign up to set email alerts
|

Effect of Prosody Modification on Children's ASR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(14 citation statements)
references
References 18 publications
0
14
0
Order By: Relevance
“…The second factor that decreases the recognition rate is the speaking rate of the adult and child speakers. The phoneme duration of the children's speakers is longer as compared to the adults [4]. Thus, the speaking duration of the children's speakers is slower than the adult speakers [7,8].…”
Section: Introductionmentioning
confidence: 92%
See 1 more Smart Citation
“…The second factor that decreases the recognition rate is the speaking rate of the adult and child speakers. The phoneme duration of the children's speakers is longer as compared to the adults [4]. Thus, the speaking duration of the children's speakers is slower than the adult speakers [7,8].…”
Section: Introductionmentioning
confidence: 92%
“…From the literature, it was also found that the pitch of the children is quite different and higher than the adult's speech. This is one of the factors that make children's speech different from adult speech and causes acoustic mismatch [4,5]. The range of the pitch frequency mainly lies between 70 Hz to 255 Hz for the adult speakers whereas for children's pitch frequency ranges usually from 200 Hz to 350 Hz [4][5][6].…”
Section: Introductionmentioning
confidence: 99%
“…In the context of children speech, prosodic features and modifications are well studied [2,11,13,15,16]. Prior work [16] has leveraged similar prosody modifications for data augmentation in children ASR achieving substantial gains in performance.…”
Section: Related Workmentioning
confidence: 99%
“…To alleviate data scarcity, we augment training audio data. Specifically, we compare SpecAugment- [1] and prosody-based [2] data augmentation (section 4). SpecAugment, recently popularized for building a robust ASR, has not been explored for processing children's speech.…”
Section: Introductionmentioning
confidence: 99%
“…As discussed earlier, the speech data of child speakers differ from the adults due to pitch and speaking rate. In the case of child speaker, formant scaling also occurs due to smaller vocal tract geometry [42,43]. Child speakers have higher formant frequencies than adults.…”
Section: Effect Of Data-augmented Training On Vmd-mfcc Featuresmentioning
confidence: 99%