Abstract:Manipulation 2D and 3D Reconstruction Generative Models Digital Humans Compression 903.63 KB Robotics …and Beyond! Figure 1: Contribution of this report. Following a survey of over 250 papers, we provide a review of (Part I) techniques in neural fields such as prior learning and conditioning, representations, forward maps, architectures, and manipulation, and of (Part II) applications in visual computing including 2D image processing, 3D scene reconstruction, generative modeling, digital humans, compression, r… Show more
“…Neural implicit representations or neural fields have recently advanced neural processing for 3D data and multi-view 2D images [72,47,59,93,50]. For a review of this emerging space we point the reader to the reports by Kato et al [34], Tewari et al [80], and Xie et al [89]. In particular, a neural radiance field (NeRF) can be fit to a set of posed 2D images and maps a 3D point coordinate and a view direction to an RGB color and density.…”
Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, enabling high-quality 3D reconstruction and novel view synthesis from image observations. However, editing a scene represented by a NeRF is challenging, as the underlying connectionist representations such as MLPs or voxel grids are not object-centric or compositional. In particular, it has been difficult to selectively edit specific regions or objects. In this work, we tackle the problem of semantic scene decomposition of NeRFs to enable query-based local editing of the represented 3D scenes. We propose to distill the knowledge of off-the-shelf, self-supervised 2D image feature extractors such as CLIP-LSeg or DINO into a 3D feature field optimized in parallel to the radiance field. Given a user-specified query of various modalities such as text, an image patch, or a point-and-click selection, 3D feature fields semantically decompose 3D space without the need for re-training, and enables us to semantically select and edit regions in the radiance field. Our experiments validate that the distilled feature fields can transfer recent progress in 2D vision and language foundation models to 3D scene representations, enabling convincing 3D segmentation and selective editing of emerging neural graphics representations.Preprint. Under review.
“…Neural implicit representations or neural fields have recently advanced neural processing for 3D data and multi-view 2D images [72,47,59,93,50]. For a review of this emerging space we point the reader to the reports by Kato et al [34], Tewari et al [80], and Xie et al [89]. In particular, a neural radiance field (NeRF) can be fit to a set of posed 2D images and maps a 3D point coordinate and a view direction to an RGB color and density.…”
Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, enabling high-quality 3D reconstruction and novel view synthesis from image observations. However, editing a scene represented by a NeRF is challenging, as the underlying connectionist representations such as MLPs or voxel grids are not object-centric or compositional. In particular, it has been difficult to selectively edit specific regions or objects. In this work, we tackle the problem of semantic scene decomposition of NeRFs to enable query-based local editing of the represented 3D scenes. We propose to distill the knowledge of off-the-shelf, self-supervised 2D image feature extractors such as CLIP-LSeg or DINO into a 3D feature field optimized in parallel to the radiance field. Given a user-specified query of various modalities such as text, an image patch, or a point-and-click selection, 3D feature fields semantically decompose 3D space without the need for re-training, and enables us to semantically select and edit regions in the radiance field. Our experiments validate that the distilled feature fields can transfer recent progress in 2D vision and language foundation models to 3D scene representations, enabling convincing 3D segmentation and selective editing of emerging neural graphics representations.Preprint. Under review.
“…Common representations include voxels [9,10,37] and point clouds [34,35,48,49,42]. More recently, researchers study shapes represented with neural fields [50], e.g., signed distance functions (SDFs) or occupancy (indicator) functions of shapes modeled by neural networks. Subsequently, meshes can be extracted by contouring methods such as marching cubes [22].…”
Section: Neural Shape Representationsmentioning
confidence: 99%
“…The methods have been called neural implicit representations [23,24,30,15,11,6,53] or coordinate-based representations [40]. We decided to use the term neural fields in this paper [50].…”
We propose a new representation for encoding 3D shapes as neural fields. The representation is designed to be compatible with the transformer architecture and to benefit both shape reconstruction and shape generation. Existing works on neural fields are grid-based representations with latents defined on a regular grid. In contrast, we define latents on irregular grids, enabling our representation to be sparse and adaptive. In the context of shape reconstruction from point clouds, our shape representation built on irregular grids improves upon grid-based methods in terms of reconstruction accuracy. For shape generation, our representation promotes high-quality shape generation using auto-regressive probabilistic models. We show different applications that improve over the current state of the art. First, we show results for probabilistic shape reconstruction from a single higher resolution image. Second, we train a probabilistic model conditioned on very low resolution images. Third, we apply our model to category-conditioned generation. All probabilistic experiments confirm that we are able to generate detailed and high quality shapes to yield the new state of the art in generative 3D shape modeling.Preprint. Under review.
“…3D voxelgrids, combined with differentiable ray-marching, first allowed self-supervised discovery of shape and appearance from images [39,27]. Inspired by neural implicit shape representations [34,28,7], neural-field based representations, combined with neural rendering, lifted limitations of resolution [40,33,30,47,46]. By conditioning on latent variables, this enables 3D reconstruction from just a single observation [40,33].…”
Neural scene representations, both continuous and discrete, have recently emerged as a powerful new paradigm for 3D scene understanding. Recent efforts have tackled unsupervised discovery of object-centric neural scene representations. However, the high cost of ray-marching, exacerbated by the fact that each object representation has to be ray-marched separately, leads to insufficiently sampled radiance fields and thus, noisy renderings, poor framerates, and high memory and time complexity during training and rendering. Here, we propose to represent objects in an object-centric, compositional scene representation as light fields. We propose a novel light field compositor module that enables reconstructing the global light field from a set of object-centric light fields. Dubbed Compositional Object Light Fields (COLF), our method enables unsupervised learning of object-centric neural scene representations, state-of-the-art reconstruction and novel view synthesis performance on standard datasets, and rendering and training speeds at orders of magnitude faster than existing 3D approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.