2020
DOI: 10.48550/arxiv.2009.00348
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LiftFormer: 3D Human Pose Estimation using attention models

Adrian Llopart

Abstract: Estimating the 3D position of human joints has become a widely researched topic in the last years. Special emphasis has gone into defining novel methods that extrapolate 2-dimensional data (keypoints) into 3D, namely predicting the root-relative coordinates of joints associated to human skeletons. The latest research trends have proven that the Transformer Encoder blocks aggregate temporal information significantly better than previous approaches. Thus, we propose the usage of these models to obtain more accur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 42 publications
0
4
0
Order By: Relevance
“…To make the algorithm invariant to the camera angle, the 3D joints were rotated to bring the vector between the hip joints into the frontal plane and to align the vector from the mid hip to the sternum vertically in the frontal plane. The 3D joint locations were tokenized as in [27]: by concatenating the together the joint locations for each frame into a perframe vector and passing them through an MLP to match the transformer embedding dimension, and using sinusoidal embedding for positional encoding. To include the subject's height, we provided a token of height embedded with an additional MLP and a learned positional embedding.…”
Section: Methods: Gait Analysis Pipeline and Trainingmentioning
confidence: 99%
“…To make the algorithm invariant to the camera angle, the 3D joints were rotated to bring the vector between the hip joints into the frontal plane and to align the vector from the mid hip to the sternum vertically in the frontal plane. The 3D joint locations were tokenized as in [27]: by concatenating the together the joint locations for each frame into a perframe vector and passing them through an MLP to match the transformer embedding dimension, and using sinusoidal embedding for positional encoding. To include the subject's height, we provided a token of height embedded with an additional MLP and a learned positional embedding.…”
Section: Methods: Gait Analysis Pipeline and Trainingmentioning
confidence: 99%
“…Deep learning [22], multi-scale temporal features, spatio-temporal KCS pose differentiation, and occlusion data augmentation [29] have been used for the 2D to 3D development of human pose estimation [30,31]. Other methods use attention models [32] and multi-scale networks with phase inference optimization [33], introducing many parameters requiring manual tuning. The performance of the graphical model-based approach has been surpassed by convolutional neural networks (CNNs) [31,34].…”
Section: Related Workmentioning
confidence: 99%
“…2 Related Works 3D Human Pose Estimation 3D human pose estimation is mainly categorized into top-down and bottom-up methods. Top-down methods use a cropped bounding box as input that contains a single person (Li and Chan 2014;Sun et al 2017;Pavlakos et al 2017;Sun et al 2018;Moon, Chang, and Lee 2019;Martinez et al 2017;Nie, Wei, and Zhu 2017;Gong, Zhang, and Feng 2021;Llopart 2020). Meanwhile, bottomup methods estimate all personal keypoints from the input image and then group them into each set of a person (Fabbri et al 2020;Lin and Lee 2020;Mehta et al 2020;Wang et al 2010).…”
Section: Introductionmentioning
confidence: 99%
“…The alternative is a two-stage approach with a lifting network (Martinez et al 2017;Nie, Wei, and Zhu 2017;Llopart 2020;Gong, Zhang, and Feng 2021). The two-stage method first estimates 2D keypoints' coordinates and then translates 2D coordinates into 3D coordinates using an additional lifting network.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation