In this paper, we develop a vision-based system that employs a combined RGB and depth descriptor to classify hand gestures. The method is studied for a human-machine interface application in the car. Two interconnected modules are employed: one that detects a hand in the region of interaction and performs user classification, and another that performs gesture recognition. The feasibility of the system is demonstrated using a challenging RGBD hand gesture data set collected under settings of common illumination variation and occlusion.Index Terms-Depth cue analysis, driver assistance systems, hand gesture recognition, human-machine interaction, infotainment.
I. INTRODUCTIONR ECENT years have seen a tremendous growth in novel devices and techniques for human-computer interaction (HCI). These draw upon human-to-human communication modalities to introduce certain intuitiveness and ease to the HCI. In particular, interfaces incorporating hand gestures have gained popularity in many fields of application. In this paper, we are concerned with the automatic visual interpretation of dynamic hand gestures and study these in a framework of an invehicle interface. A real-time vision-based system is developed, with the goal of robust recognition of hand gestures performed by driver and passenger users. The techniques and analysis presented are applicable to many other application fields requiring hand gesture recognition in visually challenging real-world settings.Motivation for In-Vehicle Gestural Interfaces: In this paper, we are mainly concerned with developing a vision-based hand gesture recognition system that can generalize over different users and operating modes and show robustness under challenging visual settings. In addition to the general study of robust descriptors and fast classification schemes for hand gesture recognition, we are motivated by recent research showing advantages of gestural interfaces over other forms of interaction for certain HCI functionalities.Among tactile, touch, and gestural in-vehicle interfaces, gesture interaction was reported to pose certain advantages over the other two, such as lower visual load, reduced driving errors,
Abstract-We aim to study the modeling limitations of the commonly employed boosted decision trees classifier. Inspired by the success of large, data-hungry visual recognition models (e.g. deep convolutional neural networks), this paper focuses on the relationship between modeling capacity of the weak learners, dataset size, and dataset properties. A set of novel experiments on the Caltech Pedestrian Detection benchmark results in the best known performance among non-CNN techniques while operating at fast run-time speed. Furthermore, the performance is on par with deep architectures (9.71% log-average miss rate), while using only HOG+LUV channels as features. The conclusions from this study are shown to generalize over different object detection domains as demonstrated on the FDDB face detection benchmark (93.37% accuracy). Despite the impressive performance, this study reveals the limited modeling capacity of the common boosted trees model, motivating a need for architectural changes in order to compete with multi-level and very deep architectures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.