2021
DOI: 10.48550/arxiv.2112.08879
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds

Abstract: Existing language grounding models often use object proposal bottlenecks: a pre-trained detector proposes objects in the scene and the model learns to select the answer from these box proposals, without attending to the original image or 3D point cloud. Object detectors are typically trained on a fixed vocabulary of objects and attributes that is often too restrictive for open-domain language grounding, where an utterance may refer to visual entities at various levels of abstraction, such as a chair, the leg o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 27 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?