2017 IEEE International Conference on Computer Vision Workshops (ICCVW) 2017
DOI: 10.1109/iccvw.2017.115
|View full text |Cite
|
Sign up to set email alerts
|

Vision-as-Inverse-Graphics: Obtaining a Rich 3D Explanation of a Scene from a Single Image

Abstract: We develop an inverse graphics approach to the problem of scene understanding, obtaining a rich representation that includes descriptions of the objects in the scene and their spatial layout, as well as global latent variables like the camera parameters and lighting. The framework's stages include object detection, the prediction of the camera and lighting variables, and prediction of object-specific variables (shape, appearance and pose). This acts like the encoder of an autoencoder, with graphics rendering a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
1
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 28 publications
(22 citation statements)
references
References 13 publications
0
22
0
Order By: Relevance
“…EIG networks can be augmented with multiple scene layers in order to parse faces or other objects under occlusion 61, 62 . They be deployed in parallel or in series (using attention) to parse out multiple objects in a scene 24,[63][64][65][66] . They can even be extended to other modalities through which we perceive physical objects, such as touch, and can support flexible crossmodal transfer, allowing objects that have only been experienced in one modality (e.g., by sight) to be recognized in another (touch) 61 .…”
Section: Discussionmentioning
confidence: 99%
“…EIG networks can be augmented with multiple scene layers in order to parse faces or other objects under occlusion 61, 62 . They be deployed in parallel or in series (using attention) to parse out multiple objects in a scene 24,[63][64][65][66] . They can even be extended to other modalities through which we perceive physical objects, such as touch, and can support flexible crossmodal transfer, allowing objects that have only been experienced in one modality (e.g., by sight) to be recognized in another (touch) 61 .…”
Section: Discussionmentioning
confidence: 99%
“…from a Minecraft game). The work of IM2CAD [7] and [13] use such initialization methods in a realistic scenario and apply their models to real scenes.…”
Section: Learning Direct Optimizationmentioning
confidence: 99%
“…Recent approaches employ CNNs to infer parameters of objects [26] or whole scenes [27] to aid procedural modeling. A similar trend is observed in graphics applications where CNNs are used to map input images or partial shapes to procedural model parameters [28]- [30].…”
Section: Inverse Procedural Modelingmentioning
confidence: 99%