2021
DOI: 10.3389/frobt.2021.693050
|View full text |Cite
|
Sign up to set email alerts
|

Machine Teaching for Human Inverse Reinforcement Learning

Abstract: As robots continue to acquire useful skills, their ability to teach their expertise will provide humans the two-fold benefit of learning from robots and collaborating fluently with them. For example, robot tutors could teach handwriting to individual students and delivery robots could convey their navigation conventions to better coordinate with nearby human workers. Because humans naturally communicate their behaviors through selective demonstrations, and comprehend others’ through reasoning that resembles in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 36 publications
0
11
0
Order By: Relevance
“…IRL) but also their beliefs and subsequently what counterfactuals they would consider. We thus extend our previous work [13] to evaluate a demonstration's informativeness based on counterfactuals generated via potential reward functions on the human's mind as opposed to counterfactuals generated via one-action deviations, and scaffold by showing demonstrations of increasing informativeness.…”
Section: Proposed Techniques For Teaching Humansmentioning
confidence: 81%
See 3 more Smart Citations
“…IRL) but also their beliefs and subsequently what counterfactuals they would consider. We thus extend our previous work [13] to evaluate a demonstration's informativeness based on counterfactuals generated via potential reward functions on the human's mind as opposed to counterfactuals generated via one-action deviations, and scaffold by showing demonstrations of increasing informativeness.…”
Section: Proposed Techniques For Teaching Humansmentioning
confidence: 81%
“…Brown and Niekum [12] proposed the Set Cover Optimal Teaching (SCOT) algorithm for selecting demonstrations that provide the tightest constraints on a target reward function for a pure IRL learner. However, human learning is more multifaceted and our prior work [13] tailored SCOT for humans by incorporating human learning techniques and concepts such as scaffolding. Our initial method of scaffolding via IRL did not yield significant learning gains, which we aim to improve in this work by incorporating counterfactuals that are based on the human's beliefs regarding the robot's reward function.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Explanations help establish a connection between what has been observed and its causes, and serve as a principled basis for generalization [14]. Consequently, explanations scaffold causal learning and have a crucial role in inference [44]. Following this idea, our work also generate explanations in the form of sentence-trajectories and uses maximum likelihood inverse reinforcement learning to find a weighting of the state features that (locally) maximizes the probability of these trajectories.…”
Section: Learning Rewards From Explanationsmentioning
confidence: 99%