Acoustic variability and automatic recognition of children’s speech

Gerosa, Matteo; Giuliani, Diego; Brugnara, Fabio

doi:10.1016/j.specom.2007.01.002

Cited by 121 publications

(82 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Also, for ASR of adults' speech, the improvements obtained are less than that for ASR of children's speech under mismatched conditions. This is because the variances for the observation densities of the phone models are greater for the poor models trained on child speech than for the models trained on adult speech [2,8]. This means that the Gaussian densities are more scattered and thus less separable in the acoustic feature space for models trained on child speech.…”

Section: C) Proposed Algorithm For Adaptive Mfcc Feature Truncation Fmentioning

confidence: 99%

“…VTLN and CMLLR are the two effective techniques in the literature that are used to reduce acoustic mismatch between adult speech and child speech [8,11]. Our proposed algorithm for adaptive MFCC feature truncation also addresses acoustic mismatch and has given significant improvements in performance.…”

Section: D) Combining Proposed Algorithm With Vtln And/or Cmllrmentioning

confidence: 99%

“…as maximum a posteriori and maximum likelihood linear regression (MLLR) adaptations [8], constrained MLLR (CMLLR) adaptation [8,10], speaker adaptive training (SAT) [10], constrained MLLR-based speaker normalization [11], and their combinations [8]. Significant improvements have been reported in ASR performance on children's speech under mismatched conditions using each of these speaker adaptation methods.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Adaptive feature truncation to address acoustic mismatch in automatic recognition of children's speech

Ghai

Sinha

2016

SIP

View full text Add to dashboard Cite

Section: C) Proposed Algorithm For Adaptive Mfcc Feature Truncation Fmentioning

confidence: 99%

Section: D) Combining Proposed Algorithm With Vtln And/or Cmllrmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Adaptive feature truncation to address acoustic mismatch in automatic recognition of children's speech

Ghai

Sinha

2016

SIP

View full text Add to dashboard Cite

“…In (McGowan and Nittrouer, 1988;Nittrouer and Whalen, 1989;Lee et al, 1999;Narayanan and Potamianos, 2002;Gerosa et al, 2007), it was shown that acoustic and linguistic characteristics of children's speech are widely different from those of adults. Furthermore, these studies also show that characteristics of children's speech vary rapidly as a function of age due to the anatomical and physiological changes occurring during a child's growth and because children become more skilled in coarticulation with age.…”

Section: Speech Corporamentioning

confidence: 99%

Towards age-independent acoustic modeling

Gerosa

Giuliani

Brugnara

2009

Speech Communication

Self Cite

View full text Add to dashboard Cite

In automatic speech recognition applications, due to significant differences in voice characteristics, adults and children are usually treated as two population groups, for which different acoustic models are trained. In this paper, age-independent acoustic modeling is investigated in the context of large vocabulary speech recognition. Exploiting a small amount (9 hours) of children's speech and a more significant amount (57 hours) of adult speech, age-independent acoustic models are trained using several methods for speaker adaptive acoustic modeling. Recognition results achieved using these models are compared with those achieved using age-dependent acoustic models for children and adults, respectively. Recognition experiments are performed on four Italian speech corpora, two consisting of children's speech and two of adult speech, using 64k word and 11k word trigram language models. Methods for speaker adaptive acoustic modeling prove to be effective for training ageindependent acoustic models ensuring recognition results at least as good as those achieved with age-dependent acoustic models for adults and children.

show abstract

“…These research findings motivated us to collect a corpus of children conversational data. In fact, the few existing corpora of children speech turned out to be not usable in our system for none of them was in Danish and moreover consisted of either prompted speech or monologues of children recounting stories (D'Arcy et al, 2004;Eskenazi, 1996;Gerosa & Giuliani, 2004;Hagen et al, 1996). We transcribed and analyzed several hours of collected video and audio-taped conversation of young subjects involved in a series of interactive sessions in both Wizard of Oz studies and in an after-school class where they played with a real human actor impersonating Hans Christian Andersen (Figure 3, left).…”

Section: Children Spoken Language Recognition: Issuesmentioning

confidence: 99%