The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2021
DOI: 10.48550/arxiv.2112.01520
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Recognizing Scenes from Novel Viewpoints

Abstract: Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. In this work, we attempt to endow machines with this ability. We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories. All this without access to the RGB images from those views. We pair 2D… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 68 publications
(95 reference statements)
0
5
0
Order By: Relevance
“…Using neural networks to implicitly represent 3D scenes [29,36,48,49,52,57] has drawn much recent attention. NeRF [32] and its variants [2,3,30,34,53,54,61,66] have achieved impressive results on novel view synthesis [8,57,63] and have many applications ncluding 3D reconstruction [28,52,64,69,72], semantic segmentation [19,44,71], generative model [5,6,9,35,46], 3D content creation [1,17,37,42,55,65].…”
Section: Related Workmentioning
confidence: 99%
“…Using neural networks to implicitly represent 3D scenes [29,36,48,49,52,57] has drawn much recent attention. NeRF [32] and its variants [2,3,30,34,53,54,61,66] have achieved impressive results on novel view synthesis [8,57,63] and have many applications ncluding 3D reconstruction [28,52,64,69,72], semantic segmentation [19,44,71], generative model [5,6,9,35,46], 3D content creation [1,17,37,42,55,65].…”
Section: Related Workmentioning
confidence: 99%
“…Most similarly to our approach, 2D3DNet [17] obtains 2D features in each image using a pre-trained segmentation model, projects ("lifts") these predictions to 3D points, and refines them by a 3D network trained without 3D labels, bypassing the need for 3D annotations during training. [33] predicts semantic segmentation for a target viewpoint by rendering a volumetric 3D representation of projected semantics predicted by a pre-trained segmentation model. Similarly to the latter two works, we use an existing pre-trained generic segmentation network but with a synthesized appearance view.…”
Section: Related Workmentioning
confidence: 99%
“…where R s,t is 2D rotation matrix of φ s,t . Following common practice in indoor scene reconstruction, we give the camera a fixed downward tilt [99,71,98] and only estimate azimuth [41]. This is also a common assumption in audio localization [1,86], since azimuth has strong binaural cues.…”
Section: Estimating Pose and Localizing Soundsmentioning
confidence: 99%
“…The rotations are limited to (10 • , 90 • ) relative to the source viewpoints. We follow the standard practice to set the height to agents to be 1.5m and lock a downward tilt angle [41,98,71,99]. We render the binaural RIRs and images given the position of agents and sound sources.…”
Section: Datasetmentioning
confidence: 99%