3inGAN: Learning a 3D Generative Model from Images of a Self-similar Scene

Karnewar, Animesh; Wang, Oliver; Ritschel, Tobias; Mitra, Niloy J.

doi:10.1109/3dv57658.2022.00046

Cited by 6 publications

(1 citation statement)

References 99 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Neural Radiance Field (NeRF) [MST*20], a new volumetric neural representation, provided a breakthrough in terms of producing highly photorealistic (static) representation, simultaneously capturing geometry and appearance from only a set of posed images. A substantial body of work has rapidly emerged to extend the formulation to dynamic settings [LSZ*22,DZY*21,PCPMMN21,LNSW21,TTG*21, XHKK21,GSKH21], work with localized representations for real‐time inference [LGZL*20, RPLG21, YLT*21,LSS*21,SSC22,KRWM22b,FYW*22,WZL*22], support fast training [DLZR22, SSC22, KRWM22b, FYW*22, LCM*22], and investigate applications in the context of generative models [KRWM22a]. However, such representations often lack interpretability, require multiview input, fail to provide scene understanding, and do not provide object‐level factorization or enable object‐level scene manipulation.…”

Section: Introductionmentioning

confidence: 99%

Factored Neural Representation for Scene Understanding

Wong

Mitra

2023

Computer Graphics Forum

Self Cite

View full text Add to dashboard Cite

A long‐standing goal in scene understanding is to obtain interpretable and editable representations that can be directly constructed from a raw monocular RGB‐D video, without requiring specialized hardware setup or priors. The problem is significantly more challenging in the presence of multiple moving and/or deforming objects. Traditional methods have approached the setup with a mix of simplifications, scene priors, pretrained templates, or known deformation models. The advent of neural representations, especially neural implicit representations and radiance fields, opens the possibility of end‐to‐end optimization to collectively capture geometry, appearance, and object motion. However, current approaches produce global scene encoding, assume multiview capture with limited or no motion in the scenes, and do not facilitate easy manipulation beyond novel view synthesis. In this work, we introduce a factored neural scene representation that can directly be learned from a monocular RGB‐D video to produce object‐level neural presentations with an explicit encoding of object movement (e.g., rigid trajectory) and/or deformations (e.g., nonrigid movement). We evaluate ours against a set of neural approaches on both synthetic and real data to demonstrate that the representation is efficient, interpretable, and editable (e.g., change object trajectory). Code and data are available at: http://geometry.cs.ucl.ac.uk/projects/2023/factorednerf/.

show abstract