Unsupervised Feature Learning of Human Actions As Trajectories in Pose Embedding Manifold

Kundu, Jogendra Nath; Gor, Maharshi; Uppala, Phani Krishna; Radhakrishnan, Venkatesh Babu

doi:10.1109/wacv.2019.00160

Cited by 43 publications

(19 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Considering the Se-BiReNet and all MLP layers used in our learning architecture, the learnable parameters in our method is about 0.27 million. As some details missed in several works, we can only estimate the lowest number of parameters in those methods, such as EnGAN-PoseRNN [8] and AGC-LSTM [20]. It can be seen that our method achieves a competitive result with the least parameters, which also shows the efficiency of our method from another perspective.…”

Section: Unsupervisedmentioning

confidence: 91%

See 1 more Smart Citation

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement

Nie

Liu

2020

Computer Vision – ECCV 2020

View full text Add to dashboard Cite

Learning a good 3D human pose representation is important for human pose related tasks, e.g. human 3D pose estimation and action recognition. Within all these problems, preserving the intrinsic pose information and adapting to view variations are two critical issues. In this work, we propose a novel Siamese denoising autoencoder to learn a 3D pose representation by disentangling the pose-dependent and view-dependent feature from the human skeleton data, in a fully unsupervised manner. These two disentangled features are utilized together as the representation of the 3D pose. To consider both the kinematic and geometric dependencies, a sequential bidirectional recursive network (SeBiReNet) is further proposed to model the human skeleton data. Extensive experiments demonstrate that the learned representation 1) preserves the intrinsic information of human pose, 2) shows good transferability across datasets and tasks. Notably, our approach achieves stateof-the-art performance on two inherently different tasks: pose denoising and unsupervised action recognition. Code and models are available at: https://github.com/NIEQiang001/unsupervised-human-pose.git.

show abstract

Section: Unsupervisedmentioning

confidence: 91%

“…LongT GAN [35] 48.1* 40.18M EnGAN-PoseRNN [8] 77.8 >0.7M Ours (1-layer LSTM) 79.71 0.27M RGB+D dataset. Among those unsupervised methods on N-UCLA dataset, our method achieves the best performance with an increment of 18% compared to the work of [9].…”

Section: Unsupervisedmentioning

confidence: 99%

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement

Nie

Liu

2020

Computer Vision – ECCV 2020

View full text Add to dashboard Cite

show abstract

“…al. [25] learn the action sequence as a trajectory in the pose manifold for the downstream activity classification task. Caetano et al [26] use CNN-based feature representation over a temporal window containing skeleton dynamics.…”

Section: Related Workmentioning

confidence: 99%

Quo Vadis, Skeleton Action Recognition?

et al. 2021

View full text Add to dashboard Cite

In this paper, we study current and upcoming frontiers across the landscape of skeleton-based human action recognition. To begin with, we benchmark state-of-the-art models on the NTU-120 dataset and provide multi-layered assessment of the results. To examine skeleton action recognition 'in the wild', we introduce Skeletics-152, a curated and 3-D pose-annotated subset of RGB videos sourced from Kinetics-700, a large-scale action dataset. The results from benchmarking the top performers of NTU-120 on Skeletics-152 reveal the challenges and domain gap induced by actions 'in the wild'. We extend our study to include out-of-context actions by introducing Skeleton-Mimetics, a dataset derived from the recently introduced Mimetics dataset. Finally, as a new frontier for action recognition, we introduce Metaphorics, a dataset with caption-style annotated YouTube videos of the popular social game Dumb Charades and interpretative dance performances. Overall, our work characterizes the strengths and limitations of existing approaches and datasets. It also provides an assessment of top-performing approaches across a spectrum of activity settings and via the introduced datasets, proposes new frontiers for human action recognition.

show abstract

“…It impels the exploration of learning skeleton-based action representation in an unsupervised manner [15,24,30,14]. Often unsupervised methods use pretext tasks to generate the supervision signals, such as reconstruction [7,44], autoregression [12,30] and jigsaw puzzles [22,36]. Consequently, the learning highly relies on the quality of the designed pretext tasks, and those tasks are hard to be generalized for different downstream tasks.…”

Section: Introductionmentioning

confidence: 99%

Contrastive Positive Mining for Unsupervised 3D Action Representation Learning

Zhang¹,

Hou²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent contrastive based 3D action representation learning has made great progress. However, the strict positive/negative constraint is yet to be relaxed and the use of non-self positive is yet to be explored. In this paper, a Contrastive Positive Mining (CPM) framework is proposed for unsupervised skeleton 3D action representation learning. The CPM identifies non-self positives in a contextual queue to boost learning. Specifically, the siamese encoders are adopted and trained to match the similarity distributions of the augmented instances in reference to all instances in the contextual queue. By identifying the non-self positive instances in the queue, a positive-enhanced learning strategy is proposed to leverage the knowledge of mined positives to boost the robustness of the learned latent space against intra-class and inter-class diversity. Experimental results have shown that the proposed CPM is effective and outperforms the existing state-of-the-art unsupervised methods on the challenging NTU and PKU-MMD datasets.

show abstract

Unsupervised Feature Learning of Human Actions As Trajectories in Pose Embedding Manifold

Cited by 43 publications

References 17 publications

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement

Quo Vadis, Skeleton Action Recognition?

Contrastive Positive Mining for Unsupervised 3D Action Representation Learning

Contact Info

Product

Resources

About