PaMIR: Parametric Model-Conditioned Implicit Representation for Image-Based Human Reconstruction

Zheng, Zerong; Yu, Tao; Liu, Yebin; Dai, Qionghai

doi:10.1109/tpami.2021.3050505

Cited by 147 publications

(135 citation statements)

References 73 publications

(116 reference statements)

Supporting

Mentioning

132

Contrasting

Order By: Relevance

“…[23,27,38,25,15,43,26,47,31] learn to infer body pose and shape from a single image, but only consider minimally clothed human. Various methods [48,60,6,42,41,18,21,59,28,36,13] have recently been proposed to reconstruct human in clothing. BodyNet [48] and DeepHuman [60] output human shape in the form of occupancy voxel grids.…”

Section: Learning-based Approaches With Monocular Rgbmentioning

confidence: 99%

“…Such representation has difficulties to capture fine details due to the high memory footprint. Neural implicit functions have been introduced to replace an explicit voxel grid and have enabled high-fidelity reconstructions from single images [42,41,18,21,59,28]. A major limitation of these methods is the lack of generalization to unseen poses in the wild.…”

Section: Learning-based Approaches With Monocular Rgbmentioning

confidence: 99%

See 1 more Smart Citation

Human Performance Capture from Monocular Video in the Wild

Guo

Chen

Song

et al. 2021

2021 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

Capturing the dynamically deforming 3D shape of clothed human is essential for numerous applications, including VR/AR, autonomous driving, and human-computer interaction. Existing methods either require a highly specialized capturing setup, such as expensive multi-view imaging systems, or they lack robustness to challenging body poses. In this work, we propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses, without any additional input. We first build a 3D template human model of the subject based on a learned regression model. We then track this template model's deformation under challenging body articulations based on 2D image observations. Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW. Moreover, we demonstrate its efficacy in robustness and generalizability on videos from iPER datasets.

show abstract

Section: Learning-based Approaches With Monocular Rgbmentioning

confidence: 99%

Section: Learning-based Approaches With Monocular Rgbmentioning

confidence: 99%

Human Performance Capture from Monocular Video in the Wild

Guo

Chen

Song

et al. 2021

2021 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

show abstract

“…Wang et al [ 29 ] introduced an adversarial learning framework based on normal maps, which not only improves the front view depth de-noising performance but also infers back view depth images with impressive geometric details. Onizuka et al [ 26 ] combined a CNN (convolutional neural networks) and PCN (corresponding part connection network) to learn a distribution of the TSDF in the tetrahedral volume from a single image. Huang et al [ 27 ] used parametric 3D human body estimation to construct the semantic space and semantic deformation field, which allows the 2D/3D human body to be converted into a canonical space to reduce geometric blur caused by occlusion in pose changes.…”

Section: Related Researchmentioning

confidence: 99%

“…At present, the method that uses a single RGB image as the input is the mainstream, and the ambiguity of the scale of RGB images is an unavoidable limitation. Moreover, using only RGB images to restore the geometric details of the model does not seem to be a reliable method [ 21 , 22 , 23 , 24 , 26 , 27 , 28 ].…”

Section: Related Researchmentioning

confidence: 99%

“…These data-driven methods already encode prior information about the human body, such as posture and body shape. However, the methods based on only RGB input [ 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 ] did not achieve a reliable body shape estimation due to the ambiguity of scale. For another reason, the methods that employed depth input [ 30 , 31 ] have also made some progress, but the recovery of details is unsatisfactory.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Human Motion Tracking with Less Constraint of Initial Posture from a Single RGB-D Sensor

Liu

Wang

et al. 2021

Sensors

View full text Add to dashboard Cite

High-quality and complete human motion 4D reconstruction is of great significance for immersive VR and even human operation. However, it has inevitable self-scanning constraints, and tracking under monocular settings also has strict restrictions. In this paper, we propose a human motion capture system combined with human priors and performance capture that only uses a single RGB-D sensor. To break the self-scanning constraint, we generated a complete mesh only using the front view input to initialize the geometric capture. In order to construct a correct warping field, most previous methods initialize their systems in a strict way. To maintain high fidelity while increasing the easiness of the system, we updated the model while capturing motion. Additionally, we blended in human priors in order to improve the reliability of model warping. Extensive experiments demonstrated that our method can be used more comfortably while maintaining credible geometric warping and remaining free of self-scanning constraints.

show abstract

Bidirectional temporal feature for 3D human pose and shape estimation from a video

Sun

Tang

et al. 2023

Computer Animation & Virtual

View full text Add to dashboard Cite

3D human pose and shape estimation is the foundation of analyzing human motion. However, estimating accurate and temporally consistent 3D human motion from a video remains a challenge. By now, most of the video-based methods for estimating 3D human pose and shape rely on unidirectional temporal features and lack more comprehensive information. To solve this problem, we propose a novel model "bidirectional temporal feature for human motion recovery" (BTMR), which consists of a human motion generator and a discriminator. The transformer-based generator effectively captures the forward and reverse temporal features to enhance the temporal correlation between frames and reduces the loss of spatial information. The motion discriminator based on Bi-LSTM can distinguish whether the generated pose sequences are consistent with the realistic sequences of the AMASS dataset. In the process of continuous generation and discrimination, the model can learn more realistic and accurate poses. We evaluate our BTMR on 3DPW and MPI-INF-3DHP datasets. Without the training set of 3DPW, BTMR outperforms VIBE by 2.4 mm and 14.9 mm/s 2 in PA-MPJPE and Accel metrics and outperforms TCMR by 1.7 mm in PA-MPJPE metric on 3DPW. The results demonstrate that our BTMR produces better accurate and temporal consistent 3D human motion.

show abstract

PaMIR: Parametric Model-Conditioned Implicit Representation for Image-Based Human Reconstruction

Cited by 147 publications

References 73 publications

Human Performance Capture from Monocular Video in the Wild

Human Performance Capture from Monocular Video in the Wild

Human Motion Tracking with Less Constraint of Initial Posture from a Single RGB-D Sensor

Bidirectional temporal feature for 3D human pose and shape estimation from a video

Contact Info

Product

Resources

About