Shared Autonomy via Hindsight Optimization

Javdani, Shervin; Srinivasa, Siddhartha S.; Bagnell, J. Andrew

doi:10.15607/rss.2015.xi.032

Cited by 135 publications

(146 citation statements)

References 22 publications

Supporting

Mentioning

145

Contrasting

Order By: Relevance

“…IRL assumes access to high-quality demonstrations of the task. However, this is rarely available in robotics, where it is difficult to control high degree-of-freedom (DOF) robots [15,27,29]. Preference-based learning methods on the other are hand are very inefficient since they attempt to learn a continuous reward function from binary feedback.…”

Section: Introductionmentioning

confidence: 99%

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Palan¹,

Shevchuk²,

Landolfi³

et al. 2019

Robotics: Science and Systems XV

View full text Add to dashboard Cite

Our goal is to accurately and efficiently learn reward functions for autonomous robots. Current approaches to this problem include inverse reinforcement learning (IRL), which uses expert demonstrations, and preference-based learning, which iteratively queries the user for her preferences between trajectories. In robotics however, IRL often struggles because it is difficult to get high-quality demonstrations; conversely, preference-based learning is very inefficient since it attempts to learn a continuous, high-dimensional function from binary feedback. We propose a new framework for reward learning, DemPref, that uses both demonstrations and preference queries to learn a reward function. Specifically, we (1) use the demonstrations to learn a coarse prior over the space of reward functions, to reduce the effective size of the space from which queries are generated; and (2) use the demonstrations to ground the (active) query generation process, to improve the quality of the generated queries. Our method alleviates the efficiency issues faced by standard preference-based learning methods and does not exclusively depend on (possibly low-quality) demonstrations. In numerical experiments, we find that DemPref is significantly more efficient than a standard active preference-based learning method. In a user study, we compare our method to a standard IRL method; we find that users rated the robot trained with DemPref as being more successful at learning their desired behavior, and preferred to use the DemPref system (over IRL) to train the robot.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Palan¹,

Shevchuk²,

Landolfi³

et al. 2019

Robotics: Science and Systems XV

View full text Add to dashboard Cite

show abstract

“…In such interaction paradigms, the robot aims to infer a cost function or policy that best describes the examples that it has received. New avenues of research focus on learning such robot objectives from human input through demonstrations [9], [10], teleoperation data [11], corrections [12], [13], comparisons [14], examples of what constitutes a goal [15], or even specified proxy objectives [16]. In this paper, we focus on learning from two of such types of human inputdemonstrations and physical corrections -although we stress that the principles outlined in our formalism are more general and could be applied to the other interaction modes mentioned.…”

Section: A Robots Learning From Humansmentioning

confidence: 99%

“…To approximate the intractable integral in (12), we sampled a set X of 1500 trajectories. We sampled costs according to (11) given by random unit norm θs, then optimized them with an off-the-shelf trajectory optimizer. We used TrajOpt [41], which is based on sequential quadratic programming and uses convex-convex collision checking.…”

Section: B Approximationmentioning

confidence: 99%

“…The planning and inference were done for robot trajectories in 7-dimensional configuration space, accounting for all relevant constraints including joint limits and self-collisions, as well as collisions between obstacles in the workspace and any part of the robots body. 11 A. Experiment design 1) Independent variables: We used a 2 by 2 factoral design.…”

Section: User Study On Learning From Correctionsmentioning

confidence: 99%

“…In Tasks 3 and 4, the robot also took a cup from the human and placed it on the table, but this time it initially grasped the cup at the wrong angle, requiring human assistance to correct end-effector orientation to an upright position. For Tasks 1 and 3, the robot knew 11 For video footage of the experiment, see: https://youtu.be/stnFye8HdcU about the feature the human was asked to correct for (E = 1) and participants were told that the robot should be compliant. For Tasks 2 and 4, the correction was poorly explained (E = 0) and participants were instructed to correct any additional unwanted changes in the trajectory.…”

Section: User Study On Learning From Correctionsmentioning

confidence: 99%

See 2 more Smart Citations