The Nemours database is a collection of 814 short nonsense sentences; 74 sentences spoken by each of 11 male speakers with varying degrees of dysarthria. Additionally, the database contains two connected-speech paragraphs produced by each of the 11 speakers. The database was designed to test the intelligibility of dysarthric speech before and after enhancement by various signal processing methods, and is available on CD-ROM. It can also be used to investigate general characteristics of dysarthric speech such as production error patterns. The entire database has been marked at the word level and sentences for 10 of the 11 talkers have been marked at the phoneme level as well. This paper describes the database structure and techniques adopted to improve the performance of a Discrete Hidden Markov Model (DHMM) labeler used to assign initial phoneme labels to the elements of the database. These techniques may be useful in the design of automatic recognition systems for persons with speech disorders, especially when limited amounts of training data are available.
We will demonstrate the ModelTalker Voice Recorder (MT Voice Recorder)-an interface system that lets individuals record and bank a speech database for the creation of a synthetic voice. The system guides users through an automatic calibration process that sets pitch, amplitude, and silence. The system then prompts users with both visual (text-based) and auditory prompts. Each recording is screened for pitch, amplitude and pronunciation and users are given immediate feedback on the acceptability of each recording. Users can then rerecord an unacceptable utterance. Recordings are automatically labeled and saved and a speech database is created from these recordings. The system's intention is to make the process of recording a corpus of utterances relatively easy for those inexperienced in linguistic analysis. Ultimately, the recorded corpus and the resulting speech database is used for concatenative synthetic speech, thus allowing individuals at home or in clinics to create a synthetic voice in their own voice. The interface may prove useful for other purposes as well. The system facilitates the recording and labeling of large corpora of speech, making it useful for speech and linguistic research, and it provides immediate feedback on pronunciation, thus making it useful as a clinical learning tool.
Digital recordings of children producing the names ‘‘Rhonda’’ and ‘‘Wanda,’’ and/or ‘‘Toto’’ and ‘‘Coco’’ were made using the microphone input to a Toshiba laptop computer (16-bit samples, 22<th>050-kHz sampling rate) with an AKG C410/B head-mounted condenser microphone. These names were associated with animated characters in a mock video game running on the laptop under the control of a Speech Language Pathologist. The children, ranging in age from four to six years, were undergoing speech therapy at the Alfred I. duPont Hospital for Children for one or both of two common articulation errors: /w/ substituted for /r/; and/or /t/ substituted for /k/. The initial segment in each recorded utterance was classified by laboratory staff as either r/w or t/k, and assigned a goodness rating. Discrete Hidden Markov phoneme Models (DHMMs) trained using data recorded from normally articulating children were then used to classify the same utterances and results of the automatic classification were compared to the human classification. Results indicate that appropriately trained DHMMs can provide accurate classification of utterances from children in speech therapy. This technology could support articulation drill on home computer systems as an adjunct to speech therapy. [Work supported by Nemours Research Programs.]
The Nemours database is a collection of 814 short nonsense sentences; 74 sentences spoken by each of 11 male speakers with varying degrees of dysarthria. Additionally, the database contains two connected-speech paragraphs produced by each of the 11 speakers. The database was designed to test the intelligibility of dysarthric speech before and after enhancement by various signal processing methods, and is available on CD-ROM. It can also be used to investigate general characteristics of dysarthric speech such as production error patterns. The entire database has been marked at the word level and sentences for 10 of the 11 talkers have been marked at the phoneme level as well. This paper describes the database structure and techniques adopted to improve the performance of a Discrete Hidden Markov Model (DHMM) labeler used to assign initial phoneme labels to the elements of the database. These techniques may be useful in the design of automatic recognition systems for persons with speech disorders, especially when limited amounts of training data are available.
An HMM labeler has been extended to detect poor correspondence between phonetic labels and underlying acoustic data. This paper will present work extending the labeler to model perceptual confusions of human listeners from a forced-choice word identification experiment which used dysarthric speech. The speech and perception data are from the Nemours Dysarthric Speech database [Menendez et al., Proceedings of ICSLP 96, SaP2P1.19 (1996)]. The perceptual data comprise distributions of listener identification responses over sets of from four to six words (the intended word plus several phonetically similar foils). In all, 37 words were produced twice by each of 10 dysarthric talkers providing a total dataset of 740 items. Each of these items was identified at least 12 times by five naive listeners for a total of at least 60 responses per item. Half of this data set will be used to adapt parameters of the HMM labeler to reproduce the distribution of human responses to the speech. The remaining half of the data set will be used to assess the ability of the labeler to select phonetic responses in a manner reflecting patterns of human perceptual confusions among the response set items. [Work supported by the Nemours Research Programs and NIDRR.]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.