2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00352
|View full text |Cite
|
Sign up to set email alerts
|

What Do Single-View 3D Reconstruction Networks Learn?

Abstract: Figure 1: We provide evidence that state-of-the-art single-view 3D reconstruction methods (AtlasNet (light green, 0.38 IoU) [12], OGN (green, 0.46 IoU) [46], Matryoshka Networks (dark green, 0.47 IoU) [37]) do not actually perform reconstruction but image classification. We explicitly design pure recognition baselines (Clustering (light blue, 0.46 IoU) and Retrieval (dark blue, 0.57 IoU)) and show that they produce similar or better results both qualitatively and quantitatively. For reference, we show the grou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
227
2

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 361 publications
(233 citation statements)
references
References 48 publications
4
227
2
Order By: Relevance
“…Single-view 3D reconstruction aims at generating the 3D model of an object based on a single 2D projection of it. Currently, deep Convolutional Neural Networks (ConvNets) have achieved the highest accuracy in various benchmarks by using both low-level image cues, e.g., texture, and high-level semantic information [47,53]. Our work targets at generating the voxel-based representation of teeth volumes, which estimates a voxel occupancy grid for indicating if voxels are within the space of an object.…”
Section: Single-view 3d Reconstructionmentioning
confidence: 99%
See 1 more Smart Citation
“…Single-view 3D reconstruction aims at generating the 3D model of an object based on a single 2D projection of it. Currently, deep Convolutional Neural Networks (ConvNets) have achieved the highest accuracy in various benchmarks by using both low-level image cues, e.g., texture, and high-level semantic information [47,53]. Our work targets at generating the voxel-based representation of teeth volumes, which estimates a voxel occupancy grid for indicating if voxels are within the space of an object.…”
Section: Single-view 3d Reconstructionmentioning
confidence: 99%
“…Our task of teeth reconstruction has two unique challenges from the existing voxel-based work. (i) The reconstruction contains multiple objects (teeth) rather than a single object as in [13,47,53]. (ii) The input image of X-ray has a higher resolution than existing work (e.g., 128×128 [13]), which calls for higher computational and memory efficiency of model.…”
Section: D Reconstruction Of Oral Cavitymentioning
confidence: 99%
“…According to the theory in [40], the current state-of-theart in single-view object reconstruction does not actually perform reconstruction but image classification. The convolutional layers of the encoder are identical to the corresponding parts of state-of-the-art classification model ResNet34 except that the feature map sizes are adjusted by our input size.…”
Section: A Network Architecturementioning
confidence: 99%
“…6) CD is a reasonable metric when it is used to compare two complete point cloud. However, when using it as the evaluation metric of point cloud completion task, it cannot detect the degree of the completeness, because CD is prone to outliers [1,19], and it does not possess completeness invariance when it is used to compare an incomplete point cloud with a complete point cloud. Consider the example in Fig 3, the top row of Fig 3 illustrates that the value of CD varies greatly with the position of missing part, even though the missing ratio is the same.…”
Section: Coverage and F-score-coveragementioning
confidence: 99%