The current practice of designing the auditory mode in the user interface is poorly understood. In this survey, we aim at revealing the common understanding of the role of audio in human-computer interaction and how designers approach design tasks involving audio. We investigate which guidelines and principles participants use in their designs and which guidance is needed to improve the quality of auditory design. The responses are analysed and interpreted by quantitative and qualitative methods. The 86 participants enabled us to draw a relatively accurate picture of how the field is perceived and helped to identify problems in the design of efficient audio in the user interface. The results of the survey are subsequently developed into requirements for a methodological design framework, with the aim to provide easily accessible guidance for designers to integrate audio in the user interface.
In this paper, we describe an experiment that studies temporal synchronization between speech (Japanese) and hand pointing gestures. Gesture (G) is shown to be synchronized with either the nominal or deictic ("this", "that", 'here", etc.) expression of a phrase. It is also shown that G is predictable in the [-200 ms, 400 ms] interval around the beginning of its related expression. The use of such a quantitative model of natural speech and gesture integration (in the multimodal interface and the speech recognition system), is also discussed
In this paper, we survey the different types of error-handling strategies that have been described in the literature on recognition-based human-computer interfaces. A wide range of strategies can be found in spoken human-machine dialogues, handwriting systems, and multimodal natural interfaces.We then propose a taxonomy for classifying error-handling strategies that has the following three dimensions: the main actor in the error-handling process (machine versus user), the purpose of the strategy (error prevention, discovery, or correction), and the use of different modalities of interaction. The requirements that different error-handling strategies have on different sets of interaction modalities are also discussed. The main aim of this work is to establish a classification that can serve as a tool for understanding how to develop more efficient and more robust multimodal human-machine interfaces.Keywords: recognition-based technology, multimodal interfaces, error-handling, taxonomy, interaction design, interaction robustness
Recognition-based technologyMultimodal interaction refers to interaction with the virtual and physical environment through natural modes of communication such as speech, body gestures, handwriting, graphics, or gaze.Unlike keyboards and mice inputs, natural modes of communication usually are non-deterministic, and have to be "recognised" by a recognition system, before they can be passed on to an application.Recent developments in recognition-based technology (e.g. speech and gesture recognition) have opened a myriad of new possibilities for the design and implementation of multimodal applications.Handwriting recognisers, for example, are being used in personal digital assistants (e.g. Paragon's multilingual PenReader software for Pocket PC devices), and speech recognition has made its way into desktop machines (e.g. IBM's ViaVoice TM speech recognition engines). However, designing and
The design and evaluation of multimodal interaction is difficult. For designers in industry, developing multimodal interaction systems is a big challenge. Although past researches have presented various methodologies, they have addressed only specific cases of multimodality and failed to generalise their methodologies to a range of applications. In this paper, we present a usability framework for the design and evaluation of multimodal interaction. First, in the early phase of multimodality design, elementary multimodal commands are elicited using traditional usability techniques. Second, based on the CARE (Complementarity, Assignment, Redundancy, and Equivalence) properties and the FSM (Finite State Machine) formalism, the original set of elementary commands is automatically expanded to form a more comprehensive set of multimodal commands. Third, this new set of multimodal commands is evaluated in two ways: user-testing and errorrobustness evaluation. This framework acts as a structured and general methodology both for designing and evaluating multimodal interaction. We expect that it will help designers to produce more usable multimodal systems.
Abstract:The multimodal dimension of a user interface raises numerous problems that are not present in more traditional interfaces. In this paper, we briefly review the current approaches in software design and modality integration techniques for multimodal interaction. We then propose a simple framework for describing multimodal interaction designs and for combining sets of user inputs of different modalities. We show that the proposed framework can help designers in reasoning about synchronization patterns problems and testing interaction robustness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.