Robotics: Science and Systems XV 2019
DOI: 10.15607/rss.2019.xv.023
|View full text |Cite
|
Sign up to set email alerts
|

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Abstract: Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which iteratively queries the user for her preferences between trajectories. In robotics however, IRL often struggles because it is difficult to get high-quality demonstrations; conversely, preference-based learning is very inefficient since it attempts to learn a continuous, high… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
61
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 76 publications
(62 citation statements)
references
References 23 publications
(47 reference statements)
1
61
0
Order By: Relevance
“…Here, it is assumed that the collaborative agent has access to some underlying human reward function (usually inferred through IRL or inverse planning approaches). The human is modeled to act rationally with the highest probability, but with a non-zero probability of behaving sub-optimally [20,[47][48][49][50].…”
Section: First-order Mental Modelsmentioning
confidence: 99%
“…Here, it is assumed that the collaborative agent has access to some underlying human reward function (usually inferred through IRL or inverse planning approaches). The human is modeled to act rationally with the highest probability, but with a non-zero probability of behaving sub-optimally [20,[47][48][49][50].…”
Section: First-order Mental Modelsmentioning
confidence: 99%
“…MacGlasha et, al [25] presented a system to ground natural language commands to reward functions that captured a desired task, and used natural language as an interface for specifying rewards. Palan et,al [26] used demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated.…”
Section: Related Workmentioning
confidence: 99%
“…Ma et al [ 45 ] employed the RGB image as the visual input and presented a DRL-based mapless motion planner alleviating the need of interactions between the agent and environment. A few special techniques and model structures are also used in navigation tasks, including multiple subtasks to assist reinforcement learning [ 46 ], continuous motion control based on DDPG [ 47 ] and target-driven navigation [ 48 ]. Most of the above-mentioned methods focus on the improvement of reinforcement learning structure, and the reward value is mostly sparse.…”
Section: Related Workmentioning
confidence: 99%