Robotics: Science and Systems XVI 2020
DOI: 10.15607/rss.2020.xvi.076
|View full text |Cite
|
Sign up to set email alerts
|

Scaling data-driven robotics with reward sketching and batch reinforcement learning

Abstract: By harnessing a growing dataset of robot experience, we learn control policies for a diverse and increasing set of related manipulation tasks. To make this possible, we introduce reward sketching: an effective way of eliciting human preferences to learn the reward function for a new task. This reward function is then used to retrospectively annotate all historical data, collected for different tasks, with predicted rewards for the new task. The resulting massive annotated dataset can then be used to learn mani… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
47
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 43 publications
(48 citation statements)
references
References 50 publications
(60 reference statements)
1
47
0
Order By: Relevance
“…3D environments, requiring a much more sophisticated dynamics model. Third, we use a richer form of feedback, reward sketches (Cabi et al, 2020), rather than sparse label-based feedback.…”
Section: Learned Dynamics Models In Prior Workmentioning
confidence: 99%
“…3D environments, requiring a much more sophisticated dynamics model. Third, we use a richer form of feedback, reward sketches (Cabi et al, 2020), rather than sparse label-based feedback.…”
Section: Learned Dynamics Models In Prior Workmentioning
confidence: 99%
“…Much like our work, a number of prior works have studied how learning from broad datasets can enhance generalization in robot learning [16,33,56,13,22,24,10,5]. These works DVD is trained to predict if two videos are completing the same task or not.…”
Section: Robotic Learning From Large Datasetsmentioning
confidence: 99%
“…have largely studied the problem of collecting large and diverse robotic datasets in scalable ways [28,22,10,53,7] as well as techniques for learning general purpose policies from this style of data in an offline [13,5] or online [33,29,24] fashion. While our motivation of achieving generalization by learning from diverse data heavily overlaps with the above works, our approach fundamentally differs in that it aims to sidestep the challenges associated with collecting diverse robotic data by instead leveraging existing human data sources.…”
Section: Robotic Learning From Large Datasetsmentioning
confidence: 99%
“…As discussed previously, our goal is to train control policies in the real world in a timescale that is feasible for industrial use cases. [20] demonstrated that it is possible to train a USB-insertion policy directly from pixels, but it took 8 Calculate relative pose T t rel = T t i × T −1 i 7:…”
Section: Pre-trained Visual Featuresmentioning
confidence: 99%