A Unified Framework for Multi-Modal Isolated Gesture Recognition

Duan, Jiali; Wan, Jun; Zhou, Shuai; Guo, Xiaoyuan; Li, Stan Z.

doi:10.1145/3131343

Cited by 58 publications

(27 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared with the performances of the first round, the best recognition rate r obtained in round 2 improved considerably (from 56.90% to 67.71% on the test set). We notice that the new baseline [10] also achieved the second best performance. This baseline uses multiple modalities (RGB, depth, optical flow and saliency streams) and a spatio-temporal network architecture, with a consensus-voting strategy (see [10] for details).…”

Section: Results and Methodsmentioning

confidence: 82%

“…We notice that the new baseline [10] also achieved the second best performance. This baseline uses multiple modalities (RGB, depth, optical flow and saliency streams) and a spatio-temporal network architecture, with a consensus-voting strategy (see [10] for details). Table 2 shows a brief summary of each participants/teams' methodology.…”

Section: Results and Methodsmentioning

confidence: 82%

“…In the following, we first report the details of isolated and continuous gesture challenges respectively, and then give a brief conclusion for each track. Table 1 shows the final ranking of the isolated gesture recognition challenge, where results of five teams/participants and a new baseline [10] have been reported. For completeness, we report in that table the performances obtained in rounds 1 & 2.…”

Section: Results and Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Results and Analysis of ChaLearn LAP Multi-modal Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges

Wan

Escalera²,

Anbarjafari³

et al. 2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

Self Cite

View full text Add to dashboard Cite

We analyze the results of the 2017 ChaLearn Looking at People Challenge at ICCV. The challenge comprised three tracks: (1) large-scale isolated (2) continuous gesture recognition, and (3) real versus fake expressed emotions tracks. It is the second round for both gesture recognition challenges, which were held first in the context of the ICPR 2016 workshop on "multimedia challenges beyond visual analysis". In this second round, more participants joined the competitions, and the performances considerably improved compared to the first round. Particularly, the best recognition accuracy of isolated gesture recognition has improved from 56.90% to 67.71% in the IsoGD test set, and Mean Jaccard Index (MJI) of continuous gesture recognition has improved from 0.2869 to 0.6103 in the ConGD test set. The third track is the first challenge on real versus fake expressed emotion classification, including six emotion categories, for which a novel database was introduced. The first place was shared between two teams who achieved 67.70% averaged recognition rate on the test set. The data of the three tracks, the participants' code and method descriptions are publicly available to allow researchers to keep making progress in the field.

show abstract

Section: Results and Methodsmentioning

confidence: 82%

Section: Results and Methodsmentioning

confidence: 82%

Section: Results and Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Results and Analysis of ChaLearn LAP Multi-modal Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges

Wan

Escalera²,

Anbarjafari³

et al. 2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

Self Cite

View full text Add to dashboard Cite

show abstract

“…For isolated recognition tasks such as Isolated Gesture [10] and Action Recognition [23], most datasets provide Instance Level Annotations that is a single label for each video clip which does not contain any temporal localisation. To train deep networks using instance level annotations, researchers [11,21,27,28] frequently assign the provided instance labels to all time steps and train neural networks using Cross Entropy Loss [17].…”

Section: Introductionmentioning

confidence: 99%

“…To train deep networks using instance level annotations, researchers [11,21,27,28] frequently assign the provided instance labels to all time steps and train neural networks using Cross Entropy Loss [17]. However, identifying every part of a sequence with the same label can cause class ambiguity as different stages of a sequence can have different spatio-temporal features.…”

Section: Introductionmentioning

confidence: 99%

Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition

Camgöz¹,

Hadfield²,

Bowden³

2017

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

View full text Add to dashboard Cite

In this paper, we propose a novel particle filter based probabilistic forced alignment approach for training spatiotemporal deep neural networks using weak border level annotations.The proposed method jointly learns to localize and recognize isolated instances in continuous streams. This is done by drawing training volumes from a prior distribution of likely regions and training a discriminative 3D-CNN from this data. The classifier is then used to calculate the posterior distribution by scoring the training examples and using this as the prior for the next sampling stage.We apply the proposed approach to the challenging task of large-scale user-independent continuous gesture recognition. We evaluate the performance on the popular ChaLearn 2016 Continuous Gesture Recognition (ConGD) dataset. Our method surpasses state-of-the-art results by obtaining 0.3646 and 0.3744 Mean Jaccard Index Score on the validation and test sets of ConGD, respectively. Furthermore, we participated in the ChaLearn 2017 Continuous Gesture Recognition Challenge and was ranked 3rd. It should be noted that our method is learner independent, it can be easily combined with other approaches.

show abstract

Gesture recognition based on multi‐modal feature weight

Duan

Sun

Cheng

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Summary With the continuous development of sensor technology, the acquisition cost of RGB‐D images is getting lower and lower, and gesture recognition based on depth images and Red‐Green‐Blue (RGB) images has gradually become a research direction in the field of pattern recognition. However, most of the current processing methods for RGB‐D gesture images are relatively simple, ignoring the relationship and influence between its two modes, and unable to make full use of the correlation factors between different modes. In view of the above problems, this paper optimizes the effect of RGB‐D information processing by considering the independent features and related features of multi‐modal data to construct a weight adaptive algorithm to fuse different features. Simulation experiments show that the method proposed in this paper is better than the traditional RGB‐D gesture image processing method and the gesture recognition rate is higher. Comparing the current more advanced gesture recognition methods, the method proposed in this paper also achieves higher recognition accuracy, which verifies the feasibility and robustness of this method.

show abstract

A Unified Framework for Multi-Modal Isolated Gesture Recognition

Cited by 58 publications

References 25 publications

Results and Analysis of ChaLearn LAP Multi-modal Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges

Results and Analysis of ChaLearn LAP Multi-modal Isolated and Continuous Gesture Recognition, and Real Versus Fake Expressed Emotions Challenges

Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition

Gesture recognition based on multi‐modal feature weight

Contact Info

Product

Resources

About