2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00230
|View full text |Cite
|
Sign up to set email alerts
|

3D-RelNet: Joint Object and Relational Network for 3D Prediction

Abstract: https://nileshkulkarni.github.io/relative3d/ (b) Results (a) Overview of our method Translation Rotation Scale Instance BBoxes Union (pair) BBox Object Encoder Relative Encoder Final Prediction Image with detections Per object Per pair f Figure 1: (a) Approach Overview: We study the problem of layout estimation in 3D by reasoning about relationships between objects. Given an image and object detection boxes, we first predict the 3D pose (translation, rotation, scale) of each object and the relative pose betwee… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 48 publications
(48 citation statements)
references
References 39 publications
0
46
0
Order By: Relevance
“…These extensions have also been combined by Lin et al ( 2020 ). 3D-RelNet is also an object-centric model that predicts a pose for each object and their relation to the other objects in the scene (Kulkarni et al, 2019 ). While these approaches seem promising, in their current implementation they only consider video data from a fixed camera viewpoint.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…These extensions have also been combined by Lin et al ( 2020 ). 3D-RelNet is also an object-centric model that predicts a pose for each object and their relation to the other objects in the scene (Kulkarni et al, 2019 ). While these approaches seem promising, in their current implementation they only consider video data from a fixed camera viewpoint.…”
Section: Discussionmentioning
confidence: 99%
“…While a lot of research on learning generative models of the environment has been performed, most of them only consider individual objects (Sitzmann et al, 2019b ; Häni et al, 2020 ), consider scenes with a fixed camera viewpoint (Kosiorek et al, 2018 ; Kulkarni et al, 2019 ; Lin et al, 2020 ) or train a separate neural network for each novel scene (Mildenhall et al, 2020 ; Sitzmann et al, 2020 ). We tackle the problem of an active agent that can control the extrinsic parameters of an RGB camera as an active vision system.…”
Section: Introductionmentioning
confidence: 99%
“…The most relevant works to us are [22,46,19,9], which take a single image as input and reconstruct multiple object shapes in a scene. However, the methods [22,46,19] are designed for voxel reconstruction with limited resolution. Mesh R-CNN [9] produces object meshes, but still treats objects as isolated geometries without considering the scene context (room layout, object locations, etc.).…”
Section: Related Workmentioning
confidence: 99%
“…In indoor environments, object poses generally follow a set of interior design principles, making it a latent pattern that can be learned. By parsing images, previous works either predict 3D boxes object-wisely [14,46] or only consider pair-wise relations [19]. In our work, we assume each object has a multi-lateral relation between its surroundings, and take all in-room objects into account in predicting its bounding box.…”
Section: D Object Detection and Layout Estimationmentioning
confidence: 99%
See 1 more Smart Citation