Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475355
|View full text |Cite
|
Sign up to set email alerts
|

Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation

Abstract: 3D human pose and shape recovery from a monocular RGB image is a challenging task. Existing learning based methods highly depend on weak supervision signals, e.g. 2D and 3D joint location, due to the lack of in-the-wild paired 3D supervision. However, considering the 2D-to-3D ambiguities existed in these weak supervision labels, the network is easy to get stuck in local optima when trained with such labels. In this paper, we reduce the ambituity by optimizing multiple initializations. Specifically, we propose … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 45 publications
0
1
0
Order By: Relevance
“…Some recent works have focused on improving the temporal modeling of pose estimation by incorporating attention mechanisms, or spatio-temporal convolutions (Luvizon et al, 2018;Zhang et al, 2022;Liu et al, 2022). In addition, some research has also focused on addressing practical challenges, such as occlusions, partial observations, and noisy data, by incorporating robust estimation methods, priors, or data augmentation techniques (Cheng et al, 2019b;Liu et al, 2021b;Li et al, 2022).…”
Section: Module 2: Multi-frames Based Estimationmentioning
confidence: 99%
“…Some recent works have focused on improving the temporal modeling of pose estimation by incorporating attention mechanisms, or spatio-temporal convolutions (Luvizon et al, 2018;Zhang et al, 2022;Liu et al, 2022). In addition, some research has also focused on addressing practical challenges, such as occlusions, partial observations, and noisy data, by incorporating robust estimation methods, priors, or data augmentation techniques (Cheng et al, 2019b;Liu et al, 2021b;Li et al, 2022).…”
Section: Module 2: Multi-frames Based Estimationmentioning
confidence: 99%
“…Despite the high coverage of surveillance cameras, human activity detections still rampage in places with a less sufficient police force, even directly under high-resolution cameras. Now, as the field of Artificial Intelligence and Computer Vision develops, new solutions such as Biometric Identification [3,4,5], Object Detection/Tracking [6,7,8], Crowd Density Analysis [9,10,11,12,13], and Action Recognition [14,15,16,17,18,19] have come to light, which in theory could automatically detect objects or actions. However, this seemingly useful technology remains confined to the laboratories as a consequence of deficiencies: 1) The narrow scope of the detection process (e.g., action-only or object-only) cripples the model accuracy, for it requires a comprehensive consideration of multiple factors to determine the nature of a situation.…”
Section: Introductionmentioning
confidence: 99%