IBRNet: Learning Multi-View Image-Based Rendering

Wang, Qianqian; Wang, Zhicheng; Genova, Kyle; Srinivasan, Pratul P.; Zhou, Howard; Barron, Jonathan T.; Martin-Brualla, Ricardo; Snavely, Noah

doi:10.48550/arxiv.2102.13090

Cited by 12 publications

(27 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We demonstrate this by comparison to several state-of-the-art methods. Specifically, we evaluate the volumetric representation of NeRF [59], a meshbased representation similar to SVS [8], the neural signed distance function-based representations of IDR [4] and NLR [5], and the image-based rendering of IBRNet [28]. For SVS [8] we use our own simplified implementation and denote it SVS*.…”

Section: Methodsmentioning

confidence: 99%

“…Recent IBR techniques leverage neural networks to learn the required blending weights [19][20][21][22][23][24]. These neural IBR methods either use proxy geometry, for example obtained by SfM or MVS [25,26], together with on-surface feature aggregation [7,8] or use learned pixel aggregation functions [27,28] for geometry-free image-based view synthesis. Our approach is closely related to the geometry-assisted and feature-interpolating view synthesis techniques.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Fast Training of Neural Lumigraph Representations using Meta Learning

Bergman¹,

Kellnhofer²,

Wetzstein³

2021

Preprint

View full text Add to dashboard Cite

Novel view synthesis is a long-standing problem in machine learning and computer vision. Significant progress has recently been made in developing neural scene representations and rendering techniques that synthesize photorealistic images from arbitrary views. These representations, however, are extremely slow to train and often also slow to render. Inspired by neural variants of image-based rendering, we develop a new neural rendering approach with the goal of quickly learning a high-quality representation which can also be rendered in real-time. Our approach, MetaNLR++, accomplishes this by using a unique combination of a neural shape representation and 2D CNN-based image feature extraction, aggregation, and re-projection. To push representation convergence times down to minutes, we leverage meta learning to learn neural shape and image feature priors which accelerate training. The optimized shape and image features can then be extracted using traditional graphics techniques and rendered in real time. We show that MetaNLR++ achieves similar or better novel view synthesis results in a fraction of the time that competing methods require.Preprint. Under review.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Fast Training of Neural Lumigraph Representations using Meta Learning

Bergman¹,

Kellnhofer²,

Wetzstein³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Baselines. Among the recent generalizable NeRF methods [36,52,47], we compare with Pixel-NeRF [52] and PVA [36] which focus on very sparse (up to 3 or 4) input views. we reimplement [36] since it is not open-sourced.…”

Section: Comparison With Generalizable Nerf Methodsmentioning

confidence: 99%

“…Despite the promising results, these general NeRF [19,53] and human-specific NeRF [13,32,33,35,50] methods must be optimized for each new video separately, and generalize poorly on unseen scenarios. Generalizable NeRFs [36,47,52] try to avoid the expensive per-scene optimization by imageconditioning using pixel-aligned features. However, directly extending such methods to model complex and dynamic 3D humans is not straightforward when available observations are highly sparse.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, neural radiance fields (NeRF) [28,13,19,32,33,35,36,47,50,52,53] have shown photo-realistic novel view synthesis results in per-scene optimization settings. To avoid the expensive per-scene training and improve the practicality, generalizable NeRFs [36,52,47] have been proposed which use image-conditioned, pixel-aligned features and achieve feed-forward view synthesis from sparse input views [36,52]. Direct application of these methods to complex and non-rigid human motion is not straightforward, however, and naive solutions suffer from significant artifacts as shown in Fig.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

Kwon,

Kim,

Ceylan

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we aim at synthesizing a free-viewpoint video of an arbitrary human performance using sparse multi-view cameras. Recently, several works have addressed this problem by learning person-specific neural radiance fields (NeRF) to capture the appearance of a particular human. In parallel, some work proposed to use pixel-aligned features to generalize radiance fields to arbitrary new scenes and objects. Adopting such generalization approaches to humans, however, is highly challenging due to the heavy occlusions and dynamic articulations of body parts. To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture. Specifically, we first introduce a temporal transformer that aggregates tracked visual features based on the skeletal body motion over time. Moreover, a multi-view transformer is proposed to perform cross-attention between the temporally-fused features and the pixel-aligned features at each time step to integrate observations on the fly from multiple views. Experiments on the ZJU-MoCap and AIST datasets show that our method significantly outperforms recent generalizable NeRF methods on unseen identities and poses. The video results and code are available at https://youngjoongunc.github.io/nhp.Preprint. Under review.

show abstract

RGBD-Net: Predicting Color and Depth Images for Novel Views Synthesis

Nguyen-Ha¹,

Karnewar²,

Huynh³

et al. 2021

2021 International Conference on 3D Vision (3DV)

View full text Add to dashboard Cite

We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network. The former one predicts depth maps of the target views by using adaptive depth scaling, while the latter one leverages the predicted depths and renders spatially and temporally consistent target images. In the experimental evaluation on standard datasets, RGBD-Net not only outperforms the state-of-the-art by a clear margin, but it also generalizes well to new scenes without per-scene optimization. Moreover, we show that RGBD-Net can be optionally trained without depth supervision while still retaining high-quality rendering. Thanks to the depth regression network, RGBD-Net can be also used for creating dense 3D point clouds that are more accurate than those produced by some state-of-the-art multiview stereo methods.

show abstract

IBRNet: Learning Multi-View Image-Based Rendering

Cited by 12 publications

References 56 publications

Fast Training of Neural Lumigraph Representations using Meta Learning

Fast Training of Neural Lumigraph Representations using Meta Learning

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

RGBD-Net: Predicting Color and Depth Images for Novel Views Synthesis

Contact Info

Product

Resources

About