3D Human Body Reconstruction from a Single Image via Volumetric Regression

Jackson, Aaron S.; Manafas, Chris; Tzimiropoulos, Georgios

doi:10.1007/978-3-030-11018-5_6

Cited by 67 publications

(70 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Implementation Details. The image encoders for both the low-resolution and high-resolution levels use a stacked hourglass network [31] with 4 and 1 stacks respectively, using the modification suggested by [16] and batch normalization replaced with group normalization [45]. Note that the fine image encoder removes one downsampling operation to achieve large feature embedding resolution.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

Saito

Huang

Natsume

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

1,114

990

View full text Add to dashboard Cite

Recent advances in image-based 3D human shape estimation have been driven by the significant improvement in representation power afforded by deep neural networks. Although current approaches have demonstrated the potential in real world settings, they still fail to produce reconstructions with the level of detail often present in the input images. We argue that this limitation stems primarily form two conflicting requirements; accurate predictions require large context, but precise predictions require high resolution. Due to memory limitations in current hardware, previous approaches tend to take low resolution images as input to cover large spatial context, and produce less precise (or low resolution) 3D estimates as a result. We address this limitation by formulating a multi-level architecture that is end-to-end trainable. A coarse level observes the whole image at lower resolution and focuses on holistic reasoning. This provides context to an fine level which estimates highly detailed geometry by observing higher-resolution images. We demonstrate that our approach significantly outperforms existing state-of-the-art techniques on single image human shape reconstruction by fully leveraging 1k-resolution input images.

show abstract

Section: Resultsmentioning

confidence: 99%

“…The MLP for the fine-level image encoder has the number of neurons of (272, 512, 256, 128, 1) with skip connections at second and third layers. Note 1 https://hdrihaven.com/ 16 , resulting in the input channel size of 272 in total. The coarse PIFu module is pre-trained with the input image resized to 512 × 512 and a batch size of 8.…”

Section: Resultsmentioning

confidence: 99%

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

Saito

Huang

Natsume

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

1,114

990

View full text Add to dashboard Cite

show abstract

“…Regarding single-view human model reconstruction, there are only two recent works by Varol et al [64] and Jackson et al [26]. In the former study, the 3D human datasets used for the training process are essentially synthesized human imagery textured over SMPL models (lacking geometry details), leading to SMPL-like voxel geometries in their outputs.…”

Section: Related Workmentioning

confidence: 99%

DeepHuman: 3D Human Reconstruction From a Single Image

Zheng

Wei

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

339

291

View full text Add to dashboard Cite

We propose DeepHuman, an image-guided volume-tovolume translation CNN for 3D human reconstruction from a single RGB image. To reduce the ambiguities associated with the surface geometry reconstruction, even for the reconstruction of invisible areas, we propose and leverage a dense semantic representation generated from SMPL model as an additional input. One key feature of our network is that it fuses different scales of image features into the 3D space through volumetric feature transformation, which helps to recover accurate surface geometry. The visible surface details are further refined through a normal refinement network, which can be concatenated with the volume generation network using our proposed volumetric normal projection layer. We also contribute THuman, a 3D real-world human model dataset containing about 7000 models. The network is trained using training data generated from the dataset. Overall, due to the specific design of our network and the diversity in our dataset, our method enables 3D human model estimation given only a single image and outperforms state-of-the-art approaches.

show abstract

“…BodyNet [35] is an end-to-end network that infers volumetric body shape from a single image. By extending a face reconstruction network [37], Jackson et al [38] also propose a volume-based human shape reconstruction method. These two methods present 3D objects as voxel representation rather than mesh.…”

Section: B Non-parametric Approachmentioning

confidence: 99%

Non-Parametric Anthropometric Graph Convolutional Network for Virtual Mannequin Reconstruction

Xie

Zhong

et al. 2020

IEEE Access

View full text Add to dashboard Cite

In this paper, we present a novel non-parametric method for precisely reconstructing a three dimensional (3D) virtual mannequin from anthropometric measurements and mask image(s) based on Graph Convolution Network (GCN). The proposed method avoids heavy dependence on a particular parametric body model such as SMPL or SCPAE and can predict mesh vertices directly, which is significantly more comfortable using a GCN than a typical Convolutional Neural Network (CNN). To further improve the accuracy of the reconstruction and make the reconstruction more controllable, we incorporate the anthropometric measurements into the developed GCN. Our non-parametric reconstruction results distinctly outperform the previous graph convolution method, both visually and in terms of anthropometric accuracy. We also demonstrate that the proposed network possesses the capability to reconstruct a plausible 3D mannequin from a single-view mask. The proposed method can be effortless extended to a parametric method by appending a Multilayer Perception (MLP) to regress the parametric space of the Principal Component Analysis (PCA) model to achieve 3D reconstruction as well. Extensive experimental results demonstrate that our anthropometric GCN itself is very useful in improving the reconstruction accuracy, and the proposed method is effective and robust for 3D mannequin reconstruction. INDEX TERMS Graph convolution network, non-parametric mannequin reconstruction, anthropometric mannequin design, parametric reconstruction.

show abstract

3D Human Body Reconstruction from a Single Image via Volumetric Regression

Cited by 67 publications

References 32 publications

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

DeepHuman: 3D Human Reconstruction From a Single Image

Non-Parametric Anthropometric Graph Convolutional Network for Virtual Mannequin Reconstruction

Contact Info

Product

Resources

About