SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes

Wiles, Olivia; Zisserman, Andrew

doi:10.5244/c.31.99

Cited by 71 publications

(64 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sculpture dataset The surfaces in the blobby shape dataset are usually largely smooth and lack of details. To provide more complex (realistic) normal distributions for training, we employed 8 complicated 3D models from the sculpture shape dataset introduced in [11]. We generated samples for the sculpture dataset in exactly the same way we did for the blobby shape dataset, except that we discarded views containing holes or showing uniform normals (e.g., flat facets).…”

Section: Synthetic Data For Trainingmentioning

confidence: 99%

PS-FCN: A Flexible Learning Framework for Photometric Stereo

Chen

Han

Wong

2018

Lecture Notes in Computer Science

122

189

View full text Add to dashboard Cite

This paper addresses the problem of photometric stereo for non-Lambertian surfaces. Existing approaches often adopt simplified reflectance models to make the problem more tractable, but this greatly hinders their applications on real-world objects. In this paper, we propose a deep fully convolutional network, called PS-FCN, that takes an arbitrary number of images of a static object captured under different light directions with a fixed camera as input, and predicts a normal map of the object in a fast feed-forward pass. Unlike the recently proposed learning based method, PS-FCN does not require a pre-defined set of light directions during training and testing, and can handle multiple images and light directions in an order-agnostic manner. Although we train PS-FCN on synthetic data, it can generalize well on real datasets. We further show that PS-FCN can be easily extended to handle the problem of uncalibrated photometric stereo. Extensive experiments on public real datasets show that PS-FCN outperforms existing approaches in calibrated photometric stereo, and promising results are achieved in uncalibrated scenario, clearly demonstrating its effectiveness.

show abstract

Section: Synthetic Data For Trainingmentioning

confidence: 99%

PS-FCN: A Flexible Learning Framework for Photometric Stereo

Chen

Han

Wong

2018

Lecture Notes in Computer Science

122

189

View full text Add to dashboard Cite

show abstract

“…To fuse the deep features from multiple images, both 3D-R2N2 [6] and LSM [15] apply the recurrent unit GRU, resulting in the networks being permutation variant and inefficient for aggregating long sequence of images. Recent SilNet [28] and DeepMVS [11] simply use max pooling to preserve the first order information of the deep features of multiple images, while RayNet [21] applies average pooling to reserve the first moment information of multiple deep features. MVSNet [31] proposes a variance-based approach to capture the second moment information for multiple feature aggregation.…”

Section: Related Workmentioning

confidence: 99%

“…To compare with the existing GRU module [6][15] and the widely used max/mean/sum pooling operations [28][11] [21], we replace the GRU module of 3D-R2N2 by our f c based AttSets and the three max/mean/sum poolings, keeping all other neural layers untouched. Architecture details are in the Appendix A.…”

Section: Comparison With Gru and Pooling Operationsmentioning

confidence: 99%

Robust Attentional Aggregation of Deep Feature Sets for Multi-view 3D Reconstruction

et al. 2019

View full text Add to dashboard Cite

We study the problem of recovering an underlying 3D shape from a set of images. Existing learning based approaches usually resort to recurrent neural nets, e.g., GRU, or intuitive pooling operations, e.g., max/mean pooling, to fuse multiple deep features encoded from input images. However, GRU based approaches are unable to consistently estimate 3D shapes given the same set of input images as the recurrent unit is permutation variant. It is also unlikely to refine the 3D shape given more images due to the long-term memory loss of GRU. The widely used pooling approaches are limited to capturing only the first order/moment information, ignoring other valuable features. In this paper, we present a new feed-forward neural module, named AttSets, together with a dedicated training algorithm, named JTSO, to attentionally aggregate an arbitrary sized deep feature set for multi-view 3D reconstruction. AttSets is permutation invariant, computationally efficient, flexible and robust to multiple input images. We thoroughly evaluate various properties of AttSets on large public datasets. Extensive experiments show AttSets together with JTSO algorithm 1 significantly outperforms existing aggregation approaches.

show abstract

“…More related to our work, Favreau et al [21] apply PCA to silhouette images to extract animal gaits from video sequences. The task of predicting silhouette images from 2D input has been effectively used as a proxy for regressing 3D model parameters for humans [22,23] and other 3D objects [24].…”

Section: Related Workmentioning

confidence: 99%

Creatures Great and SMAL: Recovering the Shape and Motion of Animals from Video

Biggs

Roddick

Fitzgibbon

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We present a system to recover the 3D shape and motion of a wide variety of quadrupeds from video. The system comprises a machine learning front-end which predicts candidate 2D joint positions, a discrete optimization which finds kinematically plausible joint correspondences, and an energy minimization stage which fits a detailed 3D model to the image. In order to overcome the limited availability of motion capture training data from animals, and the difficulty of generating realistic synthetic training images, the system is designed to work on silhouette data. The joint candidate predictor is trained on synthetically generated silhouette images, and at test time, deep learning methods or standard video segmentation tools are used to extract silhouettes from real data. The system is tested on animal videos from several species, and shows accurate reconstructions of 3D shape and pose.

show abstract

SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes

Cited by 71 publications

References 38 publications

PS-FCN: A Flexible Learning Framework for Photometric Stereo

PS-FCN: A Flexible Learning Framework for Photometric Stereo

Robust Attentional Aggregation of Deep Feature Sets for Multi-view 3D Reconstruction

Creatures Great and SMAL: Recovering the Shape and Motion of Animals from Video

Contact Info

Product

Resources

About