2020 IEEE International Conference on Robotics and Automation (ICRA) 2020
DOI: 10.1109/icra40945.2020.9197331
|View full text |Cite
|
Sign up to set email alerts
|

Learning to See before Learning to Act: Visual Pre-training for Manipulation

Abstract: Does having visual priors (e.g. the ability to detect objects) facilitate learning to perform vision-based manipulation (e.g. picking up objects)? We study this problem under the framework of transfer learning, where the model is first trained on a passive vision task (i.e., the data distribution does not depend on the agent's decisions), then adapted to perform an active manipulation task (i.e., the data distribution does depend on the agent's decisions). We find that pre-training on vision tasks significantl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 58 publications
(36 citation statements)
references
References 43 publications
0
36
0
Order By: Relevance
“…VGN [4] predicts 6-DoF grasps in clutter with a one-stage pipeline from input depth images. There is also a line of works that estimate affordance of an object or a scene first and then detect grasps based on estimated affordance [42,24,54]. In most of the prior works, deep networks are trained end-to-end with only grasp supervision.…”
Section: A Learning Grasp Detectionmentioning
confidence: 99%
“…VGN [4] predicts 6-DoF grasps in clutter with a one-stage pipeline from input depth images. There is also a line of works that estimate affordance of an object or a scene first and then detect grasps based on estimated affordance [42,24,54]. In most of the prior works, deep networks are trained end-to-end with only grasp supervision.…”
Section: A Learning Grasp Detectionmentioning
confidence: 99%
“…In contrast, Yen-Chen et. al [36] showed that pre-training on semantic tasks like classification and segmentation helps in improving efficiency and generalization of grasping predictions.…”
Section: Related Workmentioning
confidence: 99%
“…Regarding learning affordances for grasping, the majority of previous works use ground-truth affordance labels to learn affordances for grasping ( [18], [33], [34], [29]). [18] use thermal maps to learn the graspable positions of several household objects.…”
Section: Related Work a Affordance Learning For Graspingmentioning
confidence: 99%