We describe our experiences building spoken language interfaces to four demonstration applications all involving 2-or 3-D spatial displays or gestural interactions: an air combat command and control simulation, an immersive VR tactical scenario viewer, a map-based air strike simulation tool with cartographic database, and a speech/gesture controller for mobile robots.
Description of video tapeFormat: VHS. Duration: 9 min. 15 sec.Immersive, interactive 3D computer display systems (often called virtual reality systems, or virtual environments) are rapidly emerging as practical options for training, command and control (C2), hazardous operations, visualization and other applications. However, the need for improved control and navigation techniques is well recognized (Herndon et al., 1994). The Navy Center for Applied Research in Artificial Intelligence has developed NAU-TILUS (Navy AUTomated Intelligent Language Understanding System), a general-purpose natural language processing system, which has previously been integrated with the graphical user interface of a simulation-based C2 system to illustrate the advantages to be gained by combining natural language understanding (NLU) and direct manipulation in a human-computer interface (Wauchope, 1994). Using the NAUTILUS system in the interface to a virtual environment (VE) is a natural extension of this work.The purpose of the project documented in this video was to demonstrate and explore some of the capabilities of a NLU interface to a VE system, and to identify some of the research issues that need to be addressed in this area. It is important to recognize that NLU is not simply speech recognition, where each individual utterance maps to a specific command. In a NLU system, a given sentence may have different meanings depending on the context, so a logical analysis of the utterance is required to determine the appropriate interpretation. This allows us to take advantage of certain powerful linguistic properties as described below.One major difficulty with interfaces to VE systems is that the user's hands and eyes are occupied in the virtual world, so standard input devices such as mice and keyboards that require a physical support and/or visual attention are impractical. Joysticks, 36gloves, and other manual input devices are useful for some types of control (pointing, manipulating objects), but they are not well suited to more abstract input functions. Language, however, is ideally suited to abstract manipulations; it is also the most natural form of communication for humans, and does not require the use of one's hands or eyes. It is especially useful for controlling things that do not have a physical presence in the VE, such as object scale, display characteristics, and time. It also provides a powerful means to access the knowledge that underlies the VE by allowing the user to ask questions of the system. Using speech output in combination with speech recognition helps to avoid the use of textual displays which can be difficult to read on immersire presentation equipment, and which can interfere with the user's view and the "reality" of the virtual world.The prototype system shown in this film uses off-the-shelf speech recognition and synthesis technology combined with the NAUTILUS system and VIEWER (Solan and Hill, 1993), a 3D tactical scenario playback system developed by NRL's Tactical Electronic Warfare Division for a s...
The major weakness of the current narrowband LPC synthesizer lies in the use of a "canned" invariant excitation signal. The use of such an excitation signal is based on three primary assumptions, namely, 1) that the amplitude spectrum of the excitation signal is flat and time invariant, 2) that the phase spectrum of the voiced excitation signal is a time-invariant function of frequency, and 3) that the probability density function of the phase spectrum of the unvoiced excitation signal is also time invariant. This paper critically examines these assumptions and presents modifications which improve the quality of the synthesized speech without requiring the transmission of additional data. Diagnostic acceptability measure (DAM) tests show an increase of up to five points in overall speech quality with the implementation of each of these improvements. These modifications can also improve the speech quality of LPC-based speech synthesizers.
The perception of pitch waver in synthetic vowels was investigated. The waver was more easily detected when heard over a loudspeaker than over headphones. This effect seems to be related to the occurrence of amplitude variations as well as pitch variations in a live room environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.