Active Reward Learning from Critiques

Cui, Yuchen; Niekum, Scott

doi:10.1109/icra.2018.8460854

Cited by 48 publications

(51 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another approach for richer feedback could allow users to manually indicate, and potentially correct, the undesirable sections of presented paths. This idea is investigated by Cui and Niekum (2018) where users segment a robots trajectory into good and bad parts.…”

Section: Discussion and Future Workmentioning

confidence: 99%

Improving user specifications for robot behavior through active preference learning: Framework and evaluation

Wilde

Blidaru

Smith

et al. 2020

The International Journal of Robotics Research

View full text Add to dashboard Cite

An important challenge in human robot interaction (HRI) is enabling non-expert users to specify complex tasks for autonomous robots. Recently, active preference learning has been applied in HRI to interactively shape a robot's behaviour. We study a framework where users specify constraints on allowable robot movements on a graphical interface, yielding a robot task specification. However, users may not be able to accurately assess the impact of such constraints on the performance of a robot. Thus, we revise the specification by iteratively presenting users with alternative solutions where some constraints might be violated, and learn about the importance of the constraints from the users' choices between these alternatives. We demonstrate our framework in a user study with a material transport task in an industrial facility. We show that nearly all users accept alternative solutions and thus obtain a revised specification through the learning process, and that the revision leads to a substantial improvement in robot performance. Further, the learning process reduces the variances between the specifications from different users and thus makes the specifications more similar. As a result, the users whose initial specifications had the largest impact on performance benefit the most from the interactive learning.

show abstract

Section: Discussion and Future Workmentioning

confidence: 99%

Improving user specifications for robot behavior through active preference learning: Framework and evaluation

Wilde

Blidaru

Smith

et al. 2020

The International Journal of Robotics Research

View full text Add to dashboard Cite

show abstract

“…Even though their method requires fewer actions suggestions than simply receiving demonstrations in an arbitrary order, the agent must be able to freely change the state of the task for asking for guidance in the correct states, which is unfeasible for most domains. Cui and Niekum (2018) use a very similar idea from (Lopes et al, 2009) to move IRL closer to real applications. In their method, the advisee generates a trajectory that is expected to maximize the gain of knowledge, according to an uncertainty function similar to Lopes'.…”

Section: Inverse Reinforcement Learningmentioning

confidence: 99%

A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems

Silva¹,

Costa²

2019

jair

199

106

View full text Add to dashboard Cite

Multiagent Reinforcement Learning (RL) solves complex tasks that require coordination with other agents through autonomous exploration of the environment. However, learning a complex task from scratch is impractical due to the huge sample complexity of RL algorithms. For this reason, reusing knowledge that can come from previous experience or other agents is indispensable to scale up multiagent RL algorithms. This survey provides a unifying view of the literature on knowledge reuse in multiagent RL. We define a taxonomy of solutions for the general knowledge reuse problem, providing a comprehensive discussion of recent progress on knowledge reuse in Multiagent Systems (MAS) and of techniques for knowledge reuse across agents (that may be actuating in a shared environment or not). We aim at encouraging the community to work towards reusing all the knowledge sources available in a MAS. For that, we provide an in-depth discussion of current lines of research and open questions.

show abstract

“…Such data can be onerous and time-consuming for users to provide. Recent work on active learning for inverse RL has sought to reduce the required number of demonstrations [2,6,26,5], but still requires some number of demonstrations to be provided manually. Our method only requires a modest number of examples of successful outcomes, followed by binary queries where the user indicates whether a particular outcome that the robot achieved is successful or not.…”

Section: Related Workmentioning

confidence: 99%

End-To-End Robotic Reinforcement Learning without Reward Engineering

Singh¹,

Yang²,

Hartikainen³

et al. 2019

Robotics: Science and Systems XV

159

View full text Add to dashboard Cite

The combination of deep neural network models and reinforcement learning algorithms can make it possible to learn policies for robotic behaviors that directly read in raw sensory inputs, such as camera images, effectively subsuming both estimation and control into one model. However, realworld applications of reinforcement learning must specify the goal of the task by means of a manually programmed reward function, which in practice requires either designing the very same perception pipeline that end-to-end reinforcement learning promises to avoid, or else instrumenting the environment with additional sensors to determine if the task has been performed successfully. In this paper, we propose an approach for removing the need for manual engineering of reward specifications by enabling a robot to learn from a modest number of examples of successful outcomes, followed by actively solicited queries, where the robot shows the user a state and asks for a label to determine whether that state represents successful completion of the task. While requesting labels for every single state would amount to asking the user to manually provide the reward signal, our method requires labels for only a tiny fraction of the states seen during training, making it an efficient and practical approach for learning skills without manually engineered rewards. We evaluate our method on real-world robotic manipulation tasks where the observations consist of images viewed by the robot's camera. In our experiments, our method effectively learns to arrange objects, place books, and drape cloth, directly from images and without any manually specified reward functions, and with only 1-4 hours of interaction with the real world. Videos of learned behavior are available at sites.

show abstract

Active Reward Learning from Critiques

Cited by 48 publications

References 10 publications

Improving user specifications for robot behavior through active preference learning: Framework and evaluation

Improving user specifications for robot behavior through active preference learning: Framework and evaluation

A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems

End-To-End Robotic Reinforcement Learning without Reward Engineering

Contact Info

Product

Resources

About