Waquar Ahmad scite author profile

The task of transcribing children's speech using statistical models trained on adults' speech is very challenging. Large mismatch in the acoustic and linguistic attributes of the training and test data is reported to degrade the performance. In such speech recognition tasks, the differences in pitch (or fundamental frequency) between the two groups of speakers is one among several mismatch factors. To overcome the pitch mismatch, an existing pitch scaling technique based on iterative spectrogram inversion is explored in this work. Explicit pitch scaling is found to improve the recognition of children's speech under mismatched setup. In addition to that, we have also studied the effect of discarding the phase information during spectrum reconstruction. This is motivated by the fact that the dominant acoustic feature extraction techniques make use of the magnitude spectrum only. On evaluating the effectiveness under mismatched testing scenario, the existing as well as the modified pitch scaling techniques result in very similar recognition performances. Furthermore, we have explored the role of pitch scaling on another speech recognition system which is trained on speech data from both adult and child speakers. Pitch scaling is noted to be effective for children's speech recognition in this case as well.

show abstract

Cosine Distance Metric Learning for Speaker Verification Using Large Margin Nearest Neighbor Method

Ahmad

Karnick

Hegde

2014

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Waquar Ahmad

Voice Conversion Based Data Augmentation to Improve Children’s Speech Recognition in Limited Data Scenario

Role of Prosodic Features on Children's Speech Recognition

Improving Children's Speech Recognition Through Time Scale Modification Based Speaking Rate Adaptation

Improving Children’s Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion

Cosine Distance Metric Learning for Speaker Verification Using Large Margin Nearest Neighbor Method

Contact Info

Product

Resources

About