Comparative evaluation of static gesture recognition techniques based on nearest neighbor, neural networks and support vector machines

Savaris, Alexandre; Wangenheim, Aldo von

doi:10.1007/s13173-010-0009-z

Cited by 6 publications

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since SVMs [29] have gained much attention in recent times due to their powerful generalization capabilities as gesture classifiers [16], [18] we evaluate different feature learning schemes using SVMs. The following approaches are evaluated in this paper using our dataset: (i) The authors in [30], [31], [32] use Hu Invariant Moments for feature learning from images of different objects and gestures; (ii) Unsupervised feature learning is applied by authors in [33] using the Spatial Pyramid (generally referred to as Bag of Features or Bag of Words (BoW)) a combination of SIFT and k-means; (iii) Shape properties of objects such as roundness, form factor, compactness, eccentricity, perimeter, solidity etc are used by the authors in [31], [34]; (iv) Skeletonization has been proposed by the authors in [35], [36] for gesture recognition tasks, such as the counting the number of fingers; (v) Pyramid of Histogram Oriented Gradients (PHOG) [37], a variant of the famous HOG descriptor [38], gained popularity for its vectorized HOG feature learning approach; (vi) The Fast Fourier Transform (FFT) has been used by the authors in [39] to represent the shape of the hand contour in images using the spatial domain; (vii) CNNs called Tiled CNNs [40] are supervised feature learners and classifiers able to learn complex invariances such as scale and rotational invariance.…”

Section: A Existing Approachesmentioning

confidence: 99%

“…Recently, different research efforts on 2D appearance model-based methods for gesture recognition have emerged [9], [10], [11], [12], [13], [14], [15], amongst which supervised and unsupervised learning techniques such as Neural Networks (NNs), Support Vector Machine (SVMs) and NearestNeighbor [16], [17], [18] classifiers have gained familiarity. However, feature learning is not a part of such classification schemes and needs to be performed separately to compute features such as edges, gradients, pixel intensities and object shape.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Max-pooling convolutional neural networks for vision-based hand gesture recognition

Nagi

Ducatelle

et al. 2011

2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

528

236

View full text Add to dashboard Cite

Abstract-Automatic recognition of gestures using computer vision is important for many real-world applications such as sign language recognition and human-robot interaction (HRI). Our goal is a real-time hand gesture-based HRI interface for mobile robots. We use a state-of-the-art big and deep neural network (NN) combining convolution and max-pooling (MPCNN) for supervised feature learning and classification of hand gestures given by humans to mobile robots using colored gloves. The hand contour is retrieved by color segmentation, then smoothened by morphological image processing which eliminates noisy edges. Our big and deep MPCNN classifies 6 gesture classes with 96% accuracy, nearly three times better than the nearest competitor. Experiments with mobile robots using an ARM 11 533MHz processor achieve real-time gesture recognition performance.

show abstract

Section: A Existing Approachesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Max-pooling convolutional neural networks for vision-based hand gesture recognition

Nagi

Ducatelle

et al. 2011

2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

528

236

View full text Add to dashboard Cite

show abstract

“…But the space of HAR is limited into a small part of the environment, where the HAR sensor has been placed. Obviously, absolute anchoring approaches are principally vision-based methods and can be further categorized by the function of modeling variations in time: direct classification, which classifies image features without using information about the time factor and HAR is usually performed directly for each frame individually such as in [12], [13], [14] and [15]. Furthermore, temporal state-space methods where temporal data appears as a particular dimension and where every observation is equivalent to an image representation in given a time such as in [16], [17], [18] and [19].…”

Section: Related Workmentioning

confidence: 99%

Recognizing human activities based on head movement trajectories

Bertók

Fazekas

2014

2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom)

View full text Add to dashboard Cite

Human activity recognition from movementrelated signals or image sequences is a quite challenging problem in computer vision. Human activities can be decoded from various set of communication channels but it is proved that the head has a highlighted role to emphasize the message that is being communicated. Recognizing activities from head movements can be suitable, because the head has a near constant shape and appearance during the communication. The spatiotemporal segmentation of head movements can be also done by analyzing the trajectories. In this study, we give a general model for description and recognition of head movements. The basic idea has been extended by introducing a human activity database to make better decisions during the recognition. The proposed approach takes into consideration facial regions that encode essential information about head movements. The essence of head movements is extracted from motion history image representation and aligned by dynamic time warping. The efficiency of our system is also demonstrated by the recognition of head-drawn letters.

show abstract