Text-to-speech systems are currently designed to work on complete sentences and paragraphs, thereby allowing front end processors access to large amounts of linguistic context. Problems with this design arise when applications require text to be synthesized in near real time, as it is being typed. How does the system decide which incoming words should be collected and synthesized as a group when prior and subsequent word groups are unknown? We describe a rule-based parser that uses a three cell buffer and phrasing rules to identify break points for incoming text. Words up to the break point are synthesized as new text is moved into the buffer; no hierarchical structure is built beyond the lexical level. The parser was developed for use in a system that synthesizes written telecommunications by Deaf and hard of hearing people. These are texts written entirely in upper case, with little or no punctuation, and using a nonstandard variety of English (e.g. WHEN DO I WILL CALL BACK YOU). The parser performed well in a three month field trial utilizing tens of thousands of texts. Laboratory tests indicate that the parser exhibited a low error rate when compared with a human reader.
In this paper, we concern ourselves with an application of text-to-speech for speech-impaired, deaf, and hard of hearing people. The application is unusual because it requires real-time synthesis of unedited, spontaneously generated conversational texts transmitted via a Telecommunications Device for the Deaf (TDD). We describe a parser that we have implemented as a front end for a version of the Bell Laboratories text-to-speech synthesizer (Olive and Liberman 1985). The parser prepares TDD texts for synthesis by (a) performing lexical regularization of abbreviations and some non-standard forms, and (b) identifying prosodic phrase boundaries. Rules for identifying phrase boundaries are derived from the prosodic phrase grammar described in Bachenko and Fitzpatrick (1990). Following the parent analysis, these rules use a mix of syntactic and phonological factors to identify phrase boundaries but, unlike the parent system, they forgo building any hierarchical structure in order to bypass the need for a stacking mechanism; this permits the system to operate in near real time. As a component of the text-to-speech system, the parser has undergone rigorous testing during a successful three-month field trial at an AT&T telecommunications center in California. In addition, laboratory evaluations indicate that the parser's performance compares favorably with human judgments about phrasing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.