This paper describes the techniques used to separate the hand from a cluttered background in a gesture recognition system. Target colors are identified using a histogram-like structure called a Color Predicate, which is trained in real-time using a novel algorithm. Running on standard PC hardware, the segmentation is of sufficient speed and quality to support an interactive user interface. The method has shown its flexibility in a range of different office environments, segmenting users with many different skin-tones. Variations have been applied to other problems including finding face candidates in video sequences.
Computer Vision and other direct sensing technologies have progressed to the point where we can detect many aspects of a user's activity reliably and in real time. Simply recognizing the activity is not enough, however. If perceptual interaction is going to become a part of the user interface, we must turn our attention to the tasks we wish to perform and methods to effectively perform them.This paper attempts to further our understanding of vision-based interaction by looking at the steps involved in building practical systems, giving examples from several existing systems. We classify the types of tasks well suited to this type of interaction as pointing, control or selection, and discuss interaction techniques for each class. We address the factors affecting the selection of the control action, and various types of control signals that can be extracted from visual input. We present our design for widgets to perform different types of tasks, and techniques, similar to those used with established user interface devices, to give the user the type of control they need to perform the task well. We look at ways to combine individual widgets into Visual Interfaces that allow the user to perform these tasks both concurrently and sequentially.
This work describes the design of a functioning user interface based on visual recognition of hand gestures, and details its performance. In the interface, gesture replaces the mouse for many actions including selecting, moving and resizing windows. A camera below the screen observes the user. The hand is segmented from the background using color. Features of the hand's motion are extracted from the sequence of segmented images, and when needed the hand's pose is classified using a neural net. This information is parsed by a task specific grammar. The system runs in real time on standard PC hardware. It has demonstrated its abilities with various users in several different office environments. Having experimented with a functioning gestural interface, the authors discuss the practicality and best applications of this technology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.