Procedings of the British Machine Vision Conference 2017 2017
DOI: 10.5244/c.31.99
|View full text |Cite
|
Sign up to set email alerts
|

SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes

Abstract: The objective of this paper is 3D shape understanding from single and multiple images. To this end, we introduce a new deep-learning architecture and loss function, Sil-Net, that can handle multiple views in an order-agnostic manner. The architecture is fully convolutional, and for training we use a proxy task of silhouette prediction, rather than directly learning a mapping from 2D images to 3D shape as has been the target in most recent work.We demonstrate that with the SilNet architecture there is generalis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
64
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 71 publications
(64 citation statements)
references
References 38 publications
0
64
0
Order By: Relevance
“…Sculpture dataset The surfaces in the blobby shape dataset are usually largely smooth and lack of details. To provide more complex (realistic) normal distributions for training, we employed 8 complicated 3D models from the sculpture shape dataset introduced in [11]. We generated samples for the sculpture dataset in exactly the same way we did for the blobby shape dataset, except that we discarded views containing holes or showing uniform normals (e.g., flat facets).…”
Section: Synthetic Data For Trainingmentioning
confidence: 99%
“…Sculpture dataset The surfaces in the blobby shape dataset are usually largely smooth and lack of details. To provide more complex (realistic) normal distributions for training, we employed 8 complicated 3D models from the sculpture shape dataset introduced in [11]. We generated samples for the sculpture dataset in exactly the same way we did for the blobby shape dataset, except that we discarded views containing holes or showing uniform normals (e.g., flat facets).…”
Section: Synthetic Data For Trainingmentioning
confidence: 99%
“…To fuse the deep features from multiple images, both 3D-R2N2 [6] and LSM [15] apply the recurrent unit GRU, resulting in the networks being permutation variant and inefficient for aggregating long sequence of images. Recent SilNet [28] and DeepMVS [11] simply use max pooling to preserve the first order information of the deep features of multiple images, while RayNet [21] applies average pooling to reserve the first moment information of multiple deep features. MVSNet [31] proposes a variance-based approach to capture the second moment information for multiple feature aggregation.…”
Section: Related Workmentioning
confidence: 99%
“…To compare with the existing GRU module [6][15] and the widely used max/mean/sum pooling operations [28][11] [21], we replace the GRU module of 3D-R2N2 by our f c based AttSets and the three max/mean/sum poolings, keeping all other neural layers untouched. Architecture details are in the Appendix A.…”
Section: Comparison With Gru and Pooling Operationsmentioning
confidence: 99%
“…More related to our work, Favreau et al [21] apply PCA to silhouette images to extract animal gaits from video sequences. The task of predicting silhouette images from 2D input has been effectively used as a proxy for regressing 3D model parameters for humans [22,23] and other 3D objects [24].…”
Section: Related Workmentioning
confidence: 99%