UPST-NeRF: Universal Photorealistic Style Transfer of Neural Radiance Fields for 3D Scene

Chen, Yaosen; Qi, Yuan; Li, Zhiqiang; Xie, Yuegen Liu Wei Wang Chaoping; Xu-ming, Wen; Yu, Qien

doi:10.48550/arxiv.2208.07059

Cited by 6 publications

(11 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is worth noting that this is indeed a common problem existing in all 2D PST methods when being simply applied to 3D scene. Moreover, the stylized results of UPST [7] show some disparity with the reference color style in vision. In contrast, our approach manages to integrate the advantages of radiance field with MKL.…”

Section: Qualitative Resultsmentioning

confidence: 98%

“…WCT 2 [59] performs better than CCPL in structure preservation due to its wavelet module, but the inevitable noises and the intense edges (e.g., the bone boundary of trex) also cause disharmony. By training a high-resolution 2D PST network, UPST [7] is able to preserve the structure well. However, the color style of stylized scene is different from the reference.…”

Section: Qualitative Resultsmentioning

confidence: 99%

“…Tanks and Temples dataset. Since UPST [7] does not support this dataset, Figure 5 depicts the comparison between LipRF and 2D PST methods. Similarly, 2D PST methods result in cross-view inconsistency like the sky color on the playground, and strong noises like the messy colors on car and ground.…”

Section: Qualitative Resultsmentioning

confidence: 99%

“…CCPL [59] is a recent state-of-the-art 2D PST method. Besides, we include a concurrent work UPST [7] for comparison, which promotes 2D PST to tackle the same task of photorealistic 3D stylization.…”

Section: Methodsmentioning

confidence: 99%

See 3 more Smart Citations

Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization

Zhang¹,

Liu²,

Han³

et al. 2023

Preprint

View full text Add to dashboard Cite

Recent advances in 3D scene representation and novel view synthesis have witnessed the rise of Neural Radiance Fields (NeRFs). Nevertheless, it is not trivial to exploit NeRF for the photorealistic 3D scene stylization task, which aims to generate visually consistent and photorealistic stylized scenes from novel views. Simply coupling NeRF with photorealistic style transfer (PST) will result in cross-view inconsistency and degradation of stylized view syntheses. Through a thorough analysis, we demonstrate that this nontrivial task can be simplified in a new light: When transforming the appearance representation of a pre-trained NeRF with Lipschitz mapping, the consistency and photorealism across source views will be seamlessly encoded into the syntheses. That motivates us to build a concise and flexible learning framework namely LipRF, which upgrades arbitrary 2D PST methods with Lipschitz mapping tailored for the 3D scene. Technically, LipRF first pre-trains a radiance field to reconstruct the 3D scene, and then emulates the style on each view by 2D PST as the prior to learn a Lipschitz network to stylize the pre-trained appearance. In view of that Lipschitz condition highly impacts the expressivity of the neural network, we devise an adaptive regularization to balance the reconstruction and stylization. A gradual gradient aggregation strategy is further introduced to optimize LipRF in a cost-efficient manner. We conduct extensive experiments to show the high quality and robust performance of LipRF on both photorealistic 3D stylization and object appearance editing.

show abstract

Section: Qualitative Resultsmentioning

confidence: 98%

Section: Qualitative Resultsmentioning

confidence: 99%

Section: Qualitative Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

See 2 more Smart Citations

Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization

Zhang¹,

Liu²,

Han³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Semantic-driven editing approaches, such as strokebased scene editing [36,41,70], text-driven image synthesis and editing [1,53,56], and attribute-based face editing [28,64], have greatly improved the ease of artistic creation. However, despite the great success of 2D image edit-ing and neural rendering techniques [14,44], similar editing abilities in the 3D area are still limited: (1) they require laborious annotation such as image masks [28,75] and mesh vertices [73,78] to achieve the desired manipulation; (2) they conduct global style transfer [12,13,16,21,79] while ignoring the semantic meaning of each object part (e.g., windows and tires of a vehicle should be textured differently); (3) they can edit on categories by learning a textured 3D latent representation (e.g., 3D-aware GANs with faces and cars etc.) [6,8,9,18,48,60,63,64], or at a coarse level [37,68] with basic color assignment or objectlevel disentanglement [32], but struggle to conduct texture editing on objects with photo-realistic textures or out-ofdistribution characteristics.…”

Section: Introductionmentioning

confidence: 99%