The new AT&T Text-To-Speech (TTS) system for general U.S. English text is based on best-choice components of the AT&T Flextalk TTS, the Festival System from the University of Edinburgh, and ATR's CHATR system. From Flextalk, it employs text normalization, letter-to-sound, and prosody generation. Festival provides a flexible and modular architecture for easy experimentation and competitive evaluation of different algorithms or modules. In addition, we adopted CHATR's unit selection algorithms and modified them in an attempt to guarantee high intelligibility under all circumstances. Finally, we have added our own Harmonic plus Noise Model (HNM) backend for synthesizing the output speech. Most decisions made during the research and development phase of this system were based on formal subjective evaluations. We feel that the new system goes a long way toward delivering on the long-standing promise of truly natural-sounding, as well as highly intelligible, synthesis.
The quality of speech synthesis has come a long way since Homer Dudley's "Voder"in 1939. In fact, with the widespread use of unit-selection synthesizers, the naturalness of the synthesized speech is now high enough to pass the Turing test for short utterances, such as voice prompts. Therefore, it seems valid to ask the question "what are the next challenges for TTS Research?" This paper tries to identify unsolved issues, the solution of which would greatly enhance the state of the art in TTS.
WebTalk is a system for analyzing unstructured information from company websites to support automatic creation of spoken dialog applications. The goal is to completely automate the process of building, maintaining and deploying dialog applications by leveraging the wealth of information on the World Wide Web. WebTalk employs technologies in web mining, document understanding, question/answering, and speech and language processing. In this paper, we review extensions to these technologies to make them suitable for creating a WebTalk application. We present an evaluation study of a WebTalk spoken dialog system that has been instantiated on a telecom company website. Experiments with 30 different scenarios indicate promising results and provide evidence that such systems can potentially revolutionize the paradigm for creating and scaling spoken dialog services.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.