2021
DOI: 10.48550/arxiv.2103.07894
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images

Abstract: Grounding referring expressions in RGBD image has been an emerging field. We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate context-aware information, effectively addressing the challenge posed by the partial geometry. Our approach first fuses the language … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 47 publications
0
1
0
Order By: Relevance
“…It extends 2D visual grounding task to 3D space. Recent researches [1], [3], [8], [13], [20] aim to localize objects in a 3D indoor scenario, which heavily relies on 3D proposals by utilizing PointNet++ [26] or VoteNet [25] as the backbone with increased computational cost. Furthermore, these methods identify objects in an indoor scenario by a 3D bounding box representation, containing 3D location, 3D size and one-axis rotation angle, This 3D bounding box representation has less degree-of-freedom to handle complex manipulation tasks since the orientation is restricted in the horizontal plane.…”
Section: Introductionmentioning
confidence: 99%
“…It extends 2D visual grounding task to 3D space. Recent researches [1], [3], [8], [13], [20] aim to localize objects in a 3D indoor scenario, which heavily relies on 3D proposals by utilizing PointNet++ [26] or VoteNet [25] as the backbone with increased computational cost. Furthermore, these methods identify objects in an indoor scenario by a 3D bounding box representation, containing 3D location, 3D size and one-axis rotation angle, This 3D bounding box representation has less degree-of-freedom to handle complex manipulation tasks since the orientation is restricted in the horizontal plane.…”
Section: Introductionmentioning
confidence: 99%