This paper describes research on spoken language interfaces for interactive problem solving. A spoken language interface combines speech recognition technology with language understanding technology to provide an application-specific interface. The interface converts acoustic input (speech) into a series of words which are interpreted to produce the appropriate response and/or action. The system response may be spoken or it may be in the form of a display, as appropriate to the needs of the user. Spoken language interfaces offer significant benefits over conventional user interfaces for certain classes of applications, particularly handsbusy or eyes-busy applications, where typed input and/or visual displays may not be possible or convenient. To illustrate this, we present two examples of spoken language interfaces developed at MIT: an interactive system for urban navigation, VOYAGER; and an air travel planning system ATIS.The VOYAGER system currently runs in a few times real time and is able to provide answers for more than 50% of user queries for untrained users.
SPOKEN LANGUAGE SYSTEMSThe term spoken language interface refers to an interface that accepts spoken natural language as input, interprets that input, and produces an appropriate response to the user. Some examples of spoken language interfaces that have been explored include air travel planning [3], urban exploration and navigation [4], logistics planning [l], and office management, including voice mail and calendar access [2]. The goal of this research is to provide an interface that requires no prior enrollment for a new speaker, that is capable of handling spontaneous speech, that can interact cooperatively with the user, within a vocabulary appropriate to the task (in the range of 300-3000 words).A spoken language interface must not only convert the acoustic input (speech) into a series of words but also understand the sequence of words, in order to produce an appropriate response. However, recognition and understanding of the input is not enough: the system must also communicate the "answer" back to the user, via language generation and speech synthesis. A useful system also requires the ability to handle natural conversational interaction, including expressions that depend on previous context and/or refer to previously mentioned objects ("How do 'Authors names are listed alphabetically.