“…Neural Radiance Field (NeRF) [MST*20], a new volumetric neural representation, provided a breakthrough in terms of producing highly photorealistic (static) representation, simultaneously capturing geometry and appearance from only a set of posed images. A substantial body of work has rapidly emerged to extend the formulation to dynamic settings [LSZ*22,DZY*21,PCPMMN21,LNSW21,TTG*21, XHKK21,GSKH21], work with localized representations for real‐time inference [LGZL*20, RPLG21, YLT*21,LSS*21,SSC22,KRWM22b,FYW*22,WZL*22], support fast training [DLZR22, SSC22, KRWM22b, FYW*22, LCM*22], and investigate applications in the context of generative models [KRWM22a]. However, such representations often lack interpretability, require multiview input, fail to provide scene understanding, and do not provide object‐level factorization or enable object‐level scene manipulation.…”