A four-stream ConvNet based on spatial and depth flow for human action classification using RGB-D data

Srihari, D.; Kishore, P. V. V.; Kumar, E. Kiran; Kumar, D. Anil; Kumar, M. Teja Kiran; Prasad, M. V. D.; Prasad, Ch. Raghava

doi:10.1007/s11042-019-08588-9

Cited by 21 publications

(15 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4. The scores from CLANet are found to be better than our previous work in [7], where we used multi stream CNN with motion information. The reason for higher accuracies is because of the LSTM network which models the time series information in a more accurately.…”

Section: B Clanet Performancementioning

confidence: 61%

“…Inspired from the above benchmark datasets, we collected our own BVRCAction3D action dataset with 40 single human and 10 two human actions using 5 subjects. The complete list of actions is available at [7]. Fig.…”

Section: A Datasets and Performance Measuresmentioning

confidence: 99%

“…Previously, we approached the above problem by dividing multiple modalities to fixed length action sequences which are then arranged as a multi layered multi modal tensor. These multi-dimensional tensors are processed through deep convolutional neural networks (CNN) for learning spatial representations thereby completely ignoring the temporal structures [7].…”

Section: Introductionmentioning

confidence: 99%

“…The most formidable of these deep learning models are grouped into spatial and temporal domains. In spatial domain the models extract features with respect to the pixel location in image space using models such as Convolutional Neural Networks (CNNs) [2], [7]. For temporal or time series modelling of the RGB D data, Recurrent neural networks (RNNs) and their upgrades such as Long Short-Term Memory (LSTM) nets [15], [16].…”

Section: Introductionmentioning

confidence: 99%

“…However, these models require additional computation time in the form of motion vectors which makes them computationally inefficient due to data alignment problems. Moreover, few also tried 4 streams by adding motion information from depth sequences producing better recognition accuracies than the previous 2 stream model [7]. Similar to the above models, properties of the RGB and depth modalities have produced efficient action recognition algorithms such as depth rank pooling with CNNs [21], scene flow based RGB D channels on CNN [22] and sequence based methods with RNNs [23].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations