A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis

Yi, Jon Rong-Wei; Glass, James; Hetherington, I. Lee

doi:10.21437/icslp.2000-541

Cited by 17 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The dialogue manager receives the context-resolved semantic frame and communicates with the database and language generation [3] services to provide an appropriate reply to the user. This response is then audibly realized by the text-to-speech server [52].…”

Section: Resultsmentioning

confidence: 99%

A context resolution server for the galaxy conversational systems

Filisko¹,

Seneff²

2003

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

View full text Add to dashboard Cite

The context resolution component of a spoken dialogue system is responsible for interpreting a user's utterance in the context of previously spoken user utterances, system initiatives, and world knowledge. This thesis describes a new and independent Context Resolution (CR) server for the GALAXY conversational system framework. The server handles all the functionality of the previous CR component in a more generic and powerful manner. Among this functionality is the inheritance and masking of historical information, as well as reference and ellipsis resolution. The new server, additionally, features a component which attempts to reconstruct the intention of a user in the case of a robust parse, in which some semantic concepts from an utter-I would also like to acknowledge the members of the Spoken Language Systems Group for their daily support. Specifically, I would like to thank Joe Polifroni for his substantial help in the initial stages of this project, and for his subsequent assistance in my understanding of discourse and dialogue. I also want to thank Scott Cyphers for his help whenever I had problems with the GALAXY code. Additional thanks go out to Grace Chung, Jim Glass, and Eugene Weinstein for their ideas and helpful testing of the server.I would like to thank my fellow graduate students for bearing with me the several times I attempted to elucidate the difference between discourse and dialogue.I want to give a special shout out to my office mates for making this an awesome and unforgettable year at MIT! Brooke, thanks for keeping us in line in our crowded, yet homey, office. Xiaolong, thanks for all your advice and for letting me teach you all those great idioms. And Vlad-sausage, pepperoni, and taekwondo. . . thanks, dawg.Last, but never least, I want to thank all my family and friends, whose words and deeds were inspirational and always appreciated. Manpreet K. Singh, MD, here's to the stick-to-itiveness of grad students! Andy, thanks for the ceol, damhsa, agus craic.Mom and Dad, thankyou always for your constant encouragement and unconditional support.

show abstract

Section: Resultsmentioning

confidence: 99%

A context resolution server for the galaxy conversational systems

Filisko¹,

Seneff²

2003

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

View full text Add to dashboard Cite

show abstract

“…Anecdotally, users of the flight domain game system [2] did not like the quality of speech produced by the general-purpose synthesizer. To improve the synthesis quality, in our current system we utilize the ENVOICE synthesizer [12], a concatenative text-to-speech engine with a scalable finite-state transducer implementation for unit selection. The costs of concatenation and substitution are calculated based on local phonetic context.…”

Section: Technical Componentsmentioning

confidence: 99%

An interactive interpretation game for learning Chinese

Chao,

Seneff,

Wang

2007

Speech and Language Technology in Education (SLaTE 2007)

View full text Add to dashboard Cite

In this paper, we present an interactive interpretation game for learning Chinese. We extend our previous work on a flight domain translation game by introducing a new topic that is more appropriate for language learners. We discuss new features that have been added to the existing translation game system. We also report results from a pilot study to evaluate if the game helps learners improve their ability to speak the target language.

show abstract

“…In addition to this intuitive connection between SPEECHBUILDER domains and these technology components, there are several other reasons why this approach has been selected. First, significant effort has been devoted in the past at MIT to improving technology in dialogue system architecture [271, speech recognition [11], language understanding [24], language generation [1], discourse and dialogue [28], and, most recently speech synthesis [36]. Employing these HLT components minimizes duplication of effort, and maximizes SPEECHBUILDER'S flexibility to adopt technical advances made in these areas, which may be achieved in efforts entirely disjoint from…”

Section: Approachmentioning

confidence: 99%

“…In addition, an instance of the SLSInfo domain has been manually modified to use the ENVOICE concatenative speech synthesizer that is being developed at MIT [36]. This is encouraging for eventually being able to give developers the option of using ENVOICE as an optional synthesizer for SPEECHBUILDER domains (see Section 7.5).…”

Section: Lcsinfo and Slsinfomentioning

confidence: 99%

“…The current implementation of SPEECHBUILDER uses the DECtalk commercial speech synthesizer [9], which provides speech synthesis of reasonable, but not exceptionally high, quality. The MIT concatenative speech synthesizer, ENVOICE [36], would be a significantly better-sounding solution. However, since an ENVOICE synthesizer is based on a corpus of speech samples specific to the domain it is being used for, the developer will need to do some work before being able to use ENVOICE in a given domain.…”

Section: Echo Scriptmentioning

confidence: 99%

See 1 more Smart Citation

Speechbuilder: facilitating spoken dialogue system development

Glass¹,

Weinstein²

2001

7th European Conference on Speech Communication and Technology (Eurospeech 2001)

View full text Add to dashboard Cite

SPEECHBUILDER is a suite of tools that helps facilitate the creation of mixed-initiative spoken dialogue systems for both novice and experienced developers of human language applications. SPEECHBUILDER employs intuitive methods of specification to allow developers to create human language interfaces to structured information stored in a relational database, or to control-and transaction-based applications. The goal of this project has been both to robustly accommodate the various scenarios where spoken dialogue systems may be needed, and to provide a stable and reliable infrastructure for design and deployment of applications. SpeechBuilder has been used in various spoken language domains, including a directory of the people working at the MIT Laboratory for Computer Science, an application to control the various physical items in a typical office environment, and a system for real-time weather information access.

show abstract

A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis

Cited by 17 publications

References 9 publications

A context resolution server for the galaxy conversational systems

A context resolution server for the galaxy conversational systems

An interactive interpretation game for learning Chinese

Speechbuilder: facilitating spoken dialogue system development

Contact Info

Product

Resources

About