3D Action Recognition from Novel Viewpoints

Rahmani, Hossein; Mian, Ajmal

doi:10.1109/cvpr.2016.167

Cited by 169 publications

(165 citation statements)

References 57 publications

Supporting

Mentioning

163

Contrasting

Order By: Relevance

“…The small variation in the accuracy for different views exhibits the view invariance property of the proposed framework. Whereas, in other state-of-the-arts [18] [1] [34], accuracy of the presented frameworks vary from view to view in the range of 10% that shows these state-of-the-arts are sensitive to different views. The comparison of the recognition accuracy is shown in Table IV, V, and VI for NUCLA, UWA3DII and NTU RGB-D dataset respectively.…”

Section: Ntu Rgb-d Human Activity Datasetmentioning

confidence: 83%

“…In this paper shape temporal dynamics (STD) stream is designed to describe the long term shape dynamics of the action with deep convolutional neural network (CNN) structure whose architecture is similar to [18] except that we have connected the last 7 layer with a combination of Bidirectional LSTM and LSTM layers. The architecture of our CNN follows:…”

Section: Model Architecture and Learningmentioning

confidence: 99%

“…At the same time, AUC values of the ROC curves help to understand and compare the ROC curves in a clearer way when they cross each other or nearly close to each other. [32] 12fps 16fps HOPC [20] 0.04fps 0.5fps HPM+TM [18] 22fps 25fps Ours 36fps 28 fps…”

Section: Ntu Rgb-d Human Activity Datasetmentioning

confidence: 99%

See 2 more Smart Citations

View-Invariant Deep Architecture for Human Action Recognition Using Two-Stream Motion and Shape Temporal Dynamics

Dhiman

Vishwakarma

2020

IEEE Trans. on Image Process.

125

View full text Add to dashboard Cite

Human action Recognition for unknown views is a challenging task. We propose a view-invariant deep human action recognition framework, which is a novel integration of two important action cues: motion and shape temporal dynamics (STD). The motion stream encapsulates the motion content of action as RGB Dynamic Images (RGB-DIs) which are processed by the fine-tuned InceptionV3 model. The STD stream learns longterm view-invariant shape dynamics of action using human pose model (HPM) based view-invariant features mined from structural similarity index matrix (SSIM) based key depth human pose frames. To predict the score of the test sample, three types of late fusion (maximum, average and product) techniques are applied on individual stream scores. To validate the performance of the proposed novel framework the experiments are performed using both cross subject and cross-view validation schemes on three publically available benchmarks-NUCLA multi-view dataset, UWA3D-II Activity dataset and NTU RGB-D Activity dataset. Our algorithm outperforms with existing state-of-the-arts significantly that is reported in terms of accuracy, receiver operating characteristic (ROC) curve and area under the curve (AUC).

show abstract

Section: Ntu Rgb-d Human Activity Datasetmentioning

confidence: 83%

Section: Model Architecture and Learningmentioning

confidence: 99%

See 1 more Smart Citation

View-Invariant Deep Architecture for Human Action Recognition Using Two-Stream Motion and Shape Temporal Dynamics

Dhiman

Vishwakarma

2020

IEEE Trans. on Image Process.

125

View full text Add to dashboard Cite

show abstract

“…The same action viewed from different angles looks quite different. This issue was addressed in [162] using CNN. This method generates the training data by fitting synthetic 3D human model to real motion and renders human poses from different viewpoints.…”

Section: Discriminative/supervised Modelsmentioning

confidence: 99%

A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition

2017

View full text Add to dashboard Cite

Human activity recognition (HAR) is an important research area in the fields of human perception and computer vision due to its wide range of applications. These applications include: intelligent video surveillance, ambient assisted living, human computer interaction, human-robot interaction, entertainment, and intelligent driving. Recently, with the emergence and successful deployment of deep learning techniques for image classification, researchers have migrated from traditional handcrafting to deep learning techniques for HAR. However, handcrafted representation-based approaches are still widely used due to some bottlenecks such as computational complexity of deep learning techniques for activity recognition. However, approaches based on handcrafted representation are not able to handle complex scenarios due to their limitations and incapability; therefore, resorting to deep learning-based techniques is a natural option. This review paper presents a comprehensive survey of both handcrafted and learning-based action representations, offering comparison, analysis, and discussions on these approaches. In addition to this, the well-known public datasets available for experimentations and important applications of HAR are also presented to provide further insight into the field. This is the first review paper of its kind which presents all these aspects of HAR in a single review article with comprehensive coverage of each part. Finally, the paper is concluded with important discussions and research directions in the domain of HAR.

show abstract

“…Rahmani and Mian [35] transferred human poses to a viewinvariant high-level space and recognized action in depth image by using deep convolutional neural network. Their method obtained appropriate results in multi-view datasets.…”

Section: Introductionmentioning

confidence: 99%

Recognizing Involuntary Actions from 3D Skeleton Data Using Body States

2018

View full text Add to dashboard Cite

Abstract-Human action recognition has been one of the most active fields of research in computer vision over the last years. Two dimensional action recognition methods are facing serious challenges such as occlusion and missing the third dimension of data. Development of depth sensors has made it feasible to track positions of human body joints over time. This paper proposes a novel method for action recognition which uses temporal 3D skeletal Kinect data. This method introduces the definition of body states and then every action is modeled as a sequence of these states. The learning stage uses Fisher Linear Discriminant Analysis (LDA) to construct discriminant feature space for discriminating the body states. Moreover, this paper suggests the use of the Mahalonobis distance as an appropriate distance metric for the classification of the states of involuntary actions. Hidden Markov Model (HMM) is then used to model the temporal transition between the body states in each action. According to the results, this method significantly outperforms other popular methods, with recognition (recall) rate of 88.64% for eight different actions and up to 96.18% for classifying the class of all fall actions versus normal actions.Index Terms-Human action recognition, involuntary action recognition, Fisher, linear discriminant analysis (LDA), kinect, 3D skeleton data, hidden markov model (HMM).

show abstract

3D Action Recognition from Novel Viewpoints

Cited by 169 publications

References 57 publications

View-Invariant Deep Architecture for Human Action Recognition Using Two-Stream Motion and Shape Temporal Dynamics

View-Invariant Deep Architecture for Human Action Recognition Using Two-Stream Motion and Shape Temporal Dynamics

A Comprehensive Review on Handcrafted and Learning-Based Action Representation Approaches for Human Activity Recognition

Recognizing Involuntary Actions from 3D Skeleton Data Using Body States

Contact Info

Product

Resources

About