2020
DOI: 10.48550/arxiv.2012.11806
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos

Abstract: Despite the recent progress, 3D multi-person pose estimation from monocular videos is still challenging due to the commonly encountered problem of missing information caused by occlusion, partially out-of-frame target persons, and inaccurate person detection. To tackle this problem, we propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses that does not require camera parameters. In particula… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 48 publications
0
2
0
Order By: Relevance
“…These blocks are effectively combined with temporal dependencies to address depth ambiguity and overcome self-occlusion. Likewise, Cheng et al [146] also presents a method that combines GCNs and TCNs but for 3D multi-person pose estimation in monocular videos. The framework introduces two types of GCNs: a human-joint GCN and a human-bone GCN.…”
Section: Methods Based On Gcnsmentioning
confidence: 99%
See 1 more Smart Citation
“…These blocks are effectively combined with temporal dependencies to address depth ambiguity and overcome self-occlusion. Likewise, Cheng et al [146] also presents a method that combines GCNs and TCNs but for 3D multi-person pose estimation in monocular videos. The framework introduces two types of GCNs: a human-joint GCN and a human-bone GCN.…”
Section: Methods Based On Gcnsmentioning
confidence: 99%
“…Similar to the above methods for depth estimation, HMORs (hierarchical multi-person ordinal relations) [145] employs an integrated top-down model to estimate human bounding boxes, depths, and root-relative 3D poses simultaneously, with a coarse-to-fine architecture that, instead of using image features as the above methods for depth estimation, hierarchically estimates multi-person ordinal relations of depths and angles which captures body-part and joint-level semantics while maintaining global consistency to improve the accuracy of depth estimation. The framework proposed for 3D multi-person pose estimation in [146] combines GCNs and TCNs to estimate camera-centric poses without requiring camera parameters. It includes GCNs that estimate frame-wise 3D poses and TCNs that enforce temporal and human dynamics constraints to estimate person-centric with a joint-TCN and camera-centric 3D poses across frames with a root-TCN.…”
Section: Top-down Approachesmentioning
confidence: 99%