Talk and draw: bundling speech and graphics

Salisbury, Mark; Hendrickson, J.H.; Lammers, T.L.; Fu, Chao–Ming; Moody, Scott Arthur

doi:10.1109/2.56872

Cited by 40 publications

(18 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…As demonstrated by the famous "Put That There" example, such combinations can lead to very concise command modes [26]. We have been transferring this type of combination to a configuration using a computer, an interactive braille terminal and a speech recognition system.…”

Section: E Combination Of Inputsmentioning

confidence: 98%

Improved access to computers for the visually handicapped: new prospects and principles

Burger

1994

IEEE Trans. Rehab. Eng.

View full text Add to dashboard Cite

Absfruct-This paper discusses the future of computer access products for people with a visual handicap. The existing special computer access solutions are briefly reviewed. Their limitations are examined in terms of a human-computer interaction, and the technical origins of these limitations are analyzed. New technologies are explored and their potential for significantly improving the human-computer interface are reviewed. Finally, principles for the future development of nonvisual human-computer interfaces are proposed.

show abstract

Section: E Combination Of Inputsmentioning

confidence: 98%

Improved access to computers for the visually handicapped: new prospects and principles

Burger

1994

IEEE Trans. Rehab. Eng.

View full text Add to dashboard Cite

show abstract

“…Talk and draw [5] this is a project from Boeing for military operations and input is based on mouse and spoken commands. It uses a knowledge base.…”

Section: Precedents and Contextmentioning

confidence: 99%

Multimodal interaction with virtual worlds XMMVR: eXtensible language for MultiModal interaction with virtual reality worlds

Olmedo

Fonseca

Cardeñoso

2015

J Multimodal User Interfaces

View full text Add to dashboard Cite

Based on a philosophy of integrating components from multimodal interaction applications with 3D graphical environments, reusing already defined markup language for describing graphics, graphical and spoken interactions based on the interactive movie metaphor, a markup language for modeling scenes, behavior and interaction is sought. With the definition of this language, we hope to have a common framework for developing applications that allow multimodal interaction at 3D stages. Thus we have defined the basis of an architecture that allows us to integrate the components of such multimodal interaction applications in 3D virtual environments.Keywords Spoken interaction · Graphical interaction · Human-computer interaction · Multimodality · Dialogue systems · Avatar · Virtual environments · 3D virtual reality · Rich internet applications · Behavior Motivation and strategyIntroducing multimodal interaction can enrich user experience (UX) because natural communication includes speaking and gesture. This has been proved with augmented reality (AR) and virtual reality (VR) applications because through spoken interaction objects out of the user's view can be B Hector Olmedo accessed by naming them, freeing the user's hands. Besides, spoken interaction is a fact in mobile and ubiquitous applications. They use speech recognition, they analyze speech signals and produce the labels of recognized words. So they use a spoken modality. Another modality used in mobile applications is based on graphical interaction and it can be combined with speech recognition in several ways as it will be shown in Sect. 3.2.4 depending on how we want the various modalities to cooperate.As a modality is a process analyzing and producing chunks of information [1] and combining several modalities improves user interaction by making it multimodal, there is a W3C recommendation as an architectural framework that is fundamental for integrating modalities. This is the MMI architecture [2], a proposal of the MMI W3C Working group to be introduced in Sect. 3.3.2. The Multimodal Interaction Activity seeks to extend the Web to allow users to dynamically select the most appropriate mode of interaction for their current needs, including any disabilities, while enabling developers to provide an effective user interface for whichever modes the user selects. Depending upon the device, users will be able to provide input via speech, handwriting, touchscreens and keystrokes, with output presented via displays, pre-recorded and synthetic speech, audio, and tactile mechanisms such as mobile phone vibrators and Braille strips. The main issues about multimodal interaction that are not yet covered are: building reliable multimodal systems and usable applications, designing of usable adaptive multimodal interfaces and improving tools for the creation of multimodal applications and interfaces so they can become more mainstream [3]. The challenges of 3D interfaces and VR/AR relevant for multimodal interaction are not solved and they are related with the integration of V...

show abstract

“…Since Bolt's work, there have been many two dimensional desktop interfaces developed that show the value of combining speech and gesture input. For example, Boeing's "Talk and Draw" [10] application allowed users to draw with a mouse and use speech input to change interface modes. Similarly Cohen's QuickSet [11] combined speech and pen input for drawing on maps in command and control applications.…”

Section: Related Workmentioning

confidence: 99%

"Move the couch where?" : developing an augmented reality multimodal interface

Irawati

Green

Billinghurst

et al. 2006

2006 IEEE/ACM International Symposium on Mixed and Augmented Reality

View full text Add to dashboard Cite

This paper describes an augmented reality (AR) multimodal interface that uses speech and paddle gestures for interaction. The application allows users to intuitively arrange virtual furniture in a virtual room using a combination of speech and gestures from a real paddle. Unlike other multimodal AR applications, the multimodal fusion is based on the combination of time-based and semantic techniques to disambiguate a users speech and gesture input. We describe our AR multimodal interface architecture and discuss how the multimodal inputs are semantically integrated into a single interpretation by considering the input time stamps, the object properties, and the user context.

show abstract

Talk and draw: bundling speech and graphics

Cited by 40 publications

References 5 publications

Improved access to computers for the visually handicapped: new prospects and principles

Improved access to computers for the visually handicapped: new prospects and principles

Multimodal interaction with virtual worlds XMMVR: eXtensible language for MultiModal interaction with virtual reality worlds

"Move the couch where?" : developing an augmented reality multimodal interface

Contact Info

Product

Resources

About