2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00381
|View full text |Cite
|
Sign up to set email alerts
|

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 168 publications
(73 citation statements)
references
References 32 publications
0
58
0
Order By: Relevance
“…Dream Fields [JMB*21] combines NeRF with CLIP to generate diverse 3D objects solely from natural language descriptions, by optimizing the radiance field via multi‐view constraints based on the CLIP scores on the image caption. CLIP‐NeRF [WCH*21] proposes a CLIP‐based shape and appearance mapper to control a conditional NeRF.…”
Section: Applicationsmentioning
confidence: 99%
“…Dream Fields [JMB*21] combines NeRF with CLIP to generate diverse 3D objects solely from natural language descriptions, by optimizing the radiance field via multi‐view constraints based on the CLIP scores on the image caption. CLIP‐NeRF [WCH*21] proposes a CLIP‐based shape and appearance mapper to control a conditional NeRF.…”
Section: Applicationsmentioning
confidence: 99%
“…Language-vision approaches. Self-supervised language-vision models have gone through rapid advances in recent years [62,66,48] due to their impressive generalizability. The seminal work CLIP [66] learns a joint language-vision embedding using more than 400 billion text-image pairs.…”
Section: Related Workmentioning
confidence: 99%
“…Self-supervised language-vision models have gone through rapid advances in recent years [62,66,48] due to their impressive generalizability. The seminal work CLIP [66] learns a joint language-vision embedding using more than 400 billion text-image pairs. The learned representation is semantically meaningful and expressive, thus has been adapted to various downstream tasks [82,71,49,79,65].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Neural Radiance Fields (NeRFs) [14] have demonstrated encouraging progress for view synthesis by learning an implicit neural scene representation. Since its origin, tremendous efforts have been made to improve its quality [28]- [31], speed [32]- [34], artistic effects [35]- [37], and generalization ability [17], [38]. Specifically, Mip-NeRF [39] propose to cast a conical frustum instead of a single ray for anti-aliasing.…”
Section: A Neural 3d Renderingmentioning
confidence: 99%