3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training

Cheng, Yu; Yang, Bo; Wang, Bo; Tan, Robby T.

doi:10.48550/arxiv.2004.11822

Cited by 5 publications

(9 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, when using GTs, LiftFormer is still better than other end-to-end models [8,7,46] with 42.9, 40.1 and 39.9 mm MPJPE, respectively, which not only leverage temporal data, but also features extracted from the original RGB images themselves, optical flow or occlusion enhanced heatmaps. Our model also outperforms other SMPL-based approaches like SPIN [20] or ENAS [33], with 41.1mm and 42.4mm MPJPE, respectively, and multi-view methods, like DeepFuse [16] with 37.5mm MPJPE.…”

Section: Comparison With State-of-the-artmentioning

confidence: 96%

“…Cheng et al [7] improves the state-of-the-art by using a discriminator to assess if the generated poses are valid. Specifically, they use the Kinematic Chain Space (KCS) model, defined in [42], and expand it temporally (TKCS).…”

Section: Related Workmentioning

confidence: 99%

“…We evaluate on two motion capture datasets: Human3.6M [18] and HumanEva [37] , which have been commonly used in the literature. For both datasets we follow the evaluation procedure of previous works [22,8,32,46,24,7], among others.…”

Section: Datasets and Evaluation Protocolsmentioning

confidence: 99%

“…Additionally, the literature proves that leveraging temporal information results in substantial performance improvements. Whilst early approaches employed the use of Recurrent Neural Networks (RNNs), these methods were soon overshadowed by fully convolutional architectures [32,8,7]. Lately, the research has shifted to the introduction of attention models [24].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

LiftFormer: 3D Human Pose Estimation using attention models

Llopart

2020

Preprint

View full text Add to dashboard Cite

Estimating the 3D position of human joints has become a widely researched topic in the last years. Special emphasis has gone into defining novel methods that extrapolate 2-dimensional data (keypoints) into 3D, namely predicting the root-relative coordinates of joints associated to human skeletons. The latest research trends have proven that the Transformer Encoder blocks aggregate temporal information significantly better than previous approaches. Thus, we propose the usage of these models to obtain more accurate 3D predictions by leveraging temporal information using attention mechanisms on ordered sequences human poses in videos. Our method consistently outperforms the previous best results from the literature when using both 2D keypoint predictors by 0.3 mm (44.8 MPJPE, 0.7% improvement) and ground truth inputs by 2mm (MPJPE: 31.9, 8.4% improvement) on Human3.6M. It also achieves state-of-the-art performance on the HumanEva-I dataset with 10.5 P-MPJPE (22.2 % reduction). The number of parameters in our model is easily tunable and is smaller (9.5M) than current methodologies (16.95M and 11.25M) whilst still having better performance. Thus, our 3D lifting model's accuracy exceeds that of other end-to-end or SMPL approaches; and is comparable to many multi-view methods.

show abstract

Section: Comparison With State-of-the-artmentioning

confidence: 96%

Section: Related Workmentioning

confidence: 99%

Section: Datasets and Evaluation Protocolsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

LiftFormer: 3D Human Pose Estimation using attention models

Llopart

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…To make this approach applicable to personalized gesture-based retrieval systems, it can be extended to monocular video captured by accessible devices such as a mobile phone camera. This approach would be feasible due to recent progress in the area of 3D human pose estimation in predicting the body joint coordinates from a monocular video [37][38][39]. This would then allow future recommendation systems to take embodied processes into account, resulting in better and more responsive personalized experiences.…”

Section: Root Torsomentioning

confidence: 99%

Towards Multimodal MIR: Predicting individual differences from music-induced movement

Agrawal,

Jain,

Carlson

et al. 2020

Preprint

View full text Add to dashboard Cite

As the field of Music Information Retrieval grows, it is important to take into consideration the multi-modality of music and how aspects of musical engagement such as movement and gesture might be taken into account. Bodily movement is universally associated with music and reflective of important individual features related to music preference such as personality, mood, and empathy. Future multimodal MIR systems may benefit from taking these aspects into account. The current study addresses this by identifying individual differences, specifically Big Five personality traits, and scores on the Empathy and Systemizing Quotients (EQ/SQ) from participants' free dance movements. Our model successfully explored the unseen space for personality as well as EQ, SQ, which has not previously been accomplished for the latter. R 2 scores for personality, EQ, and SQ were 76.3%, 77.1%, and 86.7% respectively. As a follow-up, we investigated which bodily joints were most important in defining these traits. We discuss how further research may explore how the mapping of these traits to movement patterns can be used to build a more personalized, multi-modal recommendation system, as well as potential therapeutic applications.

show abstract

CapsulePose: A variational CapsNet for real-time end-to-end 3D human pose estimation

Garau

Conci

2023

Neurocomputing

View full text Add to dashboard Cite

3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training

Cited by 5 publications

References 34 publications

LiftFormer: 3D Human Pose Estimation using attention models

LiftFormer: 3D Human Pose Estimation using attention models

Towards Multimodal MIR: Predicting individual differences from music-induced movement

CapsulePose: A variational CapsNet for real-time end-to-end 3D human pose estimation

Contact Info

Product

Resources

About