We present SPSG, a novel approach to generate high-quality, colored 3D models of scenes from RGB-D scan observations by learning to infer unobserved scene geometry and color in a self-supervised fashion. Our self-supervised approach learns to jointly inpaint geometry and color by correlating an incomplete RGB-D scan with a more complete version of that scan. Notably, rather than relying on 3D reconstruction losses to inform our 3D geometry and color reconstruction, we propose adversarial and perceptual losses operating on 2D renderings in order to achieve high-resolution, high-quality colored reconstructions of scenes. This exploits the high-resolution, self-consistent signal from individual raw RGB-D frames, in contrast to fused 3D reconstructions of the frames which exhibit inconsistencies from view-dependent effects, such as color balancing or pose inconsistencies. Thus, by informing our 3D scene generation directly through 2D signal, we produce high-quality colored reconstructions of 3D scenes, outperforming state of the art on both synthetic and real data.
7% randomly selected data 29.9% mIoU Segmentation Cross Entropy Loss 7% actively selected data 43.2% mIoU 100% data 45.6% mIoU Ground-Truth RGB Image floor wall objects High CE-Loss Low CE-Loss Figure 1: Our novel active learning method, ViewAL, significantly reduces labeling effort compared to the state of the art.With maximum performance attained by using 100% of the data (last column), ViewAL is able to achieve 95% of this performance with only 7% of data of SceneNet-RGBD [29]. With the same amount of data, the best state-of-the-art method achieves 88%, and random sampling (2nd column) yields 66% of maximum attainable performance.
Figure 1: We present a new approach for 3D reconstruction conditioned on sparse point clouds or low-resolution geometry. Rather than encoding the full generative process in the neural network, which can struggle to represent local detail, we leverage an additional database of volumetric chunks from train scene data. For a given input, multiple approximate reconstructions are first created with retrieved database chunks, which are then fused together with an attention-based blendingfacilitating transfer of coherent structures and local detail from the retrieved train chunks to the output reconstruction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.