2016
DOI: 10.1109/tpami.2015.2461544
|View full text |Cite
|
Sign up to set email alerts
|

ModDrop: Adaptive Multi-Modal Gesture Recognition

Abstract: Abstract-We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
228
1
4

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 288 publications
(244 citation statements)
references
References 55 publications
1
228
1
4
Order By: Relevance
“…Inspired by the recent progress in the field of deep learning 2D Convolutional Neural Networks (2D CNNs) have been applied to the Gesture Recognition field in order to extract spatial features [7], [8]. In recent studies, features were either concatenated into fixed sized gesture templates [7] or passed to HMM [13] or Recurrent Neural Networks [8] in order to model the temporal aspects of the gestures.…”
Section: D Convolutional Neural Networkmentioning
confidence: 99%
See 4 more Smart Citations
“…Inspired by the recent progress in the field of deep learning 2D Convolutional Neural Networks (2D CNNs) have been applied to the Gesture Recognition field in order to extract spatial features [7], [8]. In recent studies, features were either concatenated into fixed sized gesture templates [7] or passed to HMM [13] or Recurrent Neural Networks [8] in order to model the temporal aspects of the gestures.…”
Section: D Convolutional Neural Networkmentioning
confidence: 99%
“…In recent studies, features were either concatenated into fixed sized gesture templates [7] or passed to HMM [13] or Recurrent Neural Networks [8] in order to model the temporal aspects of the gestures.…”
Section: D Convolutional Neural Networkmentioning
confidence: 99%
See 3 more Smart Citations