2021
DOI: 10.48550/arxiv.2110.10189
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects

Abstract: Rearrange objects that are smaller than the green glass pan tower, top, left, west line, top, left, large circle, top, right, large, north Rearrange objects that have the same color as the glass stapler tower, top, right, west line, bottom, left, large circle, top, left, small, west Rearrange yellow objects circle, bottom, right, medium circle, top, middle, medium circle, top, middle, large Rearrange yellow objects Rearrange objects that have the same material as the blue object circle, bottom, right, large ci… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 30 publications
0
9
0
Order By: Relevance
“…However, setting up a real-world TAMP system often requires substantial task-specific knowledge and accurate 3D models of the environment, significantly limiting the environments to which the system can generalize. To address this challenge, recent work has adopted deep learning-based approaches for robotic manipulation, for instance, on grasp planning [44,47,48,62,65], motion planning [7,57], and reasoning about spatial relations [20,36,49].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, setting up a real-world TAMP system often requires substantial task-specific knowledge and accurate 3D models of the environment, significantly limiting the environments to which the system can generalize. To address this challenge, recent work has adopted deep learning-based approaches for robotic manipulation, for instance, on grasp planning [44,47,48,62,65], motion planning [7,57], and reasoning about spatial relations [20,36,49].…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, we propose to use optical flow as the low-level feature descriptors, which can be naturally used to infer the full 6D transformations. In parallel to our work, recent efforts have also addressed rearrangement particularly learned from human demonstrations [14,72] and also with different goal specifications such as language [36,55].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Language-Instructed Manipulation Recently, various manipulation tasks have been researched with language input either describing the entire task, or serving interactive input for task specifications. Structformer [23] proposes an object selection network from language and visual encodings, as well as a language conditioned pose generator for semantic object rearrangement. Stepputtis et al [24] proposed a closed-loop control model for pouring tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Natural language processing has recently received much attention in the field of robotics [8], following the advances made towards learning groundings between vision and language [9], [10], [11]. Recent successes in humanrobot interaction include an interactive fetching system to localize objects mentioned in referring expressions [12], [13], [14], [15], [16] or grounding not only objects, but also spatial relations to follow language expressions characterizing pick-and-place commands [17], [18], [19]. By contrast, CALVIN tasks require grounding language to a wide variety of general-purpose robot skills.…”
Section: Related Workmentioning
confidence: 99%