The current practice of designing the auditory mode in the user interface is poorly understood. In this survey, we aim at revealing the common understanding of the role of audio in human-computer interaction and how designers approach design tasks involving audio. We investigate which guidelines and principles participants use in their designs and which guidance is needed to improve the quality of auditory design. The responses are analysed and interpreted by quantitative and qualitative methods. The 86 participants enabled us to draw a relatively accurate picture of how the field is perceived and helped to identify problems in the design of efficient audio in the user interface. The results of the survey are subsequently developed into requirements for a methodological design framework, with the aim to provide easily accessible guidance for designers to integrate audio in the user interface.
In this paper, we describe an experiment that studies temporal synchronization between speech (Japanese) and hand pointing gestures. Gesture (G) is shown to be synchronized with either the nominal or deictic ("this", "that", 'here", etc.) expression of a phrase. It is also shown that G is predictable in the [-200 ms, 400 ms] interval around the beginning of its related expression. The use of such a quantitative model of natural speech and gesture integration (in the multimodal interface and the speech recognition system), is also discussed
In this paper, we survey the different types of error-handling strategies that have been described in the literature on recognition-based human-computer interfaces. A wide range of strategies can be found in spoken human-machine dialogues, handwriting systems, and multimodal natural interfaces.We then propose a taxonomy for classifying error-handling strategies that has the following three dimensions: the main actor in the error-handling process (machine versus user), the purpose of the strategy (error prevention, discovery, or correction), and the use of different modalities of interaction. The requirements that different error-handling strategies have on different sets of interaction modalities are also discussed. The main aim of this work is to establish a classification that can serve as a tool for understanding how to develop more efficient and more robust multimodal human-machine interfaces.Keywords: recognition-based technology, multimodal interfaces, error-handling, taxonomy, interaction design, interaction robustness
Recognition-based technologyMultimodal interaction refers to interaction with the virtual and physical environment through natural modes of communication such as speech, body gestures, handwriting, graphics, or gaze.Unlike keyboards and mice inputs, natural modes of communication usually are non-deterministic, and have to be "recognised" by a recognition system, before they can be passed on to an application.Recent developments in recognition-based technology (e.g. speech and gesture recognition) have opened a myriad of new possibilities for the design and implementation of multimodal applications.Handwriting recognisers, for example, are being used in personal digital assistants (e.g. Paragon's multilingual PenReader software for Pocket PC devices), and speech recognition has made its way into desktop machines (e.g. IBM's ViaVoice TM speech recognition engines). However, designing and
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.