Since 1990 the DRA Speech Research Unit has conducted research into applications of speech recognition technology to speech and language development for young children. This has been done in collaboration wirh Hereford and Worcester County Council Education Department (HWCC) and, more recently, w i t h Sherston Software Limited, one of the UK's leading independent educational software publishers.An initial project, known as STAR (Speech Training Aid Research), was prompted by HWCC's awareness of a requirement by teachers for a computerised 'Speech Training Aid' tool to aid young children in the development of a range of communications and language skills. The goal was to develop a computer-based system which was able to distinguish between 'good' and 'poor' pronunciations of a word, spoken by a child in response to a textual, pictorial or verbal prompt, f " a LOO0 word children's vocabulary.The same speech recognition technology has subsequently been integrated into Sherston Software's commercially successful range of animated 'Talking Books'. which use stored digitised speech to enable the computer to read words out-loud to a child. This converts them into 'Talking & Listening Books' which, in addition to the existing functions, are able to 'listen' to a child reading and indicate words which have becn read incomctly. THE CHALLENGEThe use of automatic speech recognition in computer b a d tools for speech and language development in children has enormous potential. While such tools are unlikely to be a substitute for the human interaction which occurs when a teacher or parent helps a child learn to read, they could vastly increase the individual assistance which a child receives, and allow valuable time with the teacher or parent to be used more effectively. Given these advanrages, and the economic importance of literacy, it is not surprising that this problem is receiving attention from the speech technology research community (set, for example [I]).From the perspective of speech technology. the question posed by HWCC was whether automatic speech recognition can be used to distinguish between 'good' and 'poor' pronunciations of a known word spoken by an unknown child. This raises the emotive question of what constitutes a 'good' or 'poor' pronunciation. Jones [2] defines 'poor' speech as a way of talking which it is difficult for most people to understand, caused by mumbling or the lack of definiteness of utterance. By contrast, 'good' pronunciation will enable a child to participate confidently in public, cultural and working life, and will aid accurate reading and spelling. 'Good' pronunciation occurs within the context of a variety of regional accents, and is clearly not the same as Received Pronunciation ('BBC English'). Factors such as a child's confidence in speaking are ais0 relevant.Assuming that 'good' and 'poor' pronunciation can be identified, there remains the question of whether current speech pattern processing techniques are sufficiently accurate to make the required distinction. This compliments ...
A deep neural network (DNN)-based model has been developed to predict non-parametric distributions of durations of phonemes in specified phonetic contexts and used to explore which factors influence durations most. Major factors in US English are pre-pausal lengthening, lexical stress, and speaking rate. The model can be used to check that text-to-speech (TTS) training speech follows the script and words are pronounced as expected. Duration prediction is poorer with training speech for automatic speech recognition (ASR) because the training corpus typically consists of single utterances from many speakers and is often noisy or casually spoken. Low probability durations in ASR training material nevertheless mostly correspond to non-standard speech, with some having disfluencies. Children's speech is disproportionately present in these utterances, since children show much more variation in timing.
Since 1990 the DRA Speech Research Unit has conducted research into applications of speech recognition technology to speech and language development for young children. This has been done in collaboration with Hereford and Worcester County Council Education Department (HWCC) and, more recently, with Sherston Software Limited, one of the UK's leading independent educational software publishers.An initial project, known as STAR (Speech Training Aid Research), was prompted by HWCC's awareness of a requirement by teachers for a computerised 'Speech Training Aid' tool to aid young children in the development of a range of communications and language skills. The goal was to develop a computer-based system which was able to distinguish between 'good' and 'poor' pronunciations of a word, spoken by a child in response to a textual, pictorial or verbal prompt, from a 1,000 word children's vocabulary.The same speech recognition technology has subsequently been integrated into Sherston Software's commercially successful range of animated 'Talking Books', which use stored digitised speech to enable the computer to read words out-loud to a child. This converts them into 'Talking & Listening Books' which, in addition to the existing functions, are able to 'listen' to a child reading and indicate words which have been read incorrectly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.