2019
DOI: 10.1609/aaai.v33i01.33017749
|View full text |Cite
|
Sign up to set email alerts
|

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

Abstract: Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand the minimum set of demonstrations needed to teach a specific sequential decisionmaking task. We formalize the problem of finding maximally informative demonstrations for IRL as a machine teaching problem where the goal is to find the minimum number of demonstrations needed to specify the rewar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
74
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 49 publications
(77 citation statements)
references
References 22 publications
2
74
0
Order By: Relevance
“…then following π after, and Q*(s, a) refers to the optimal Q-value in a state and a possible action (Watkins and Dayan, 1992). Brown and Niekum (2019) proved that the BEC(D|π) of a set of demonstrations D of a policy π can be formulated similarly as the intersection of the following half-spaces…”
Section: Bec(π)mentioning
confidence: 99%
See 2 more Smart Citations
“…then following π after, and Q*(s, a) refers to the optimal Q-value in a state and a possible action (Watkins and Dayan, 1992). Brown and Niekum (2019) proved that the BEC(D|π) of a set of demonstrations D of a policy π can be formulated similarly as the intersection of the following half-spaces…”
Section: Bec(π)mentioning
confidence: 99%
“…Cakmak and Lopes (2012) instead focused on IRL learners and selected demonstrations that maximally reduced uncertainty over all viable reward parameters, posed as a volume removal problem. Brown and Niekum (2019) improved this method (particularly for high dimensions) by solving an equivalent set cover problem instead with their Set Cover Optimal Teaching (SCOT) algorithm. However, SCOT is not explicitly designed for human learners and this paper builds on SCOT to address that gap.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…One way to learn such latent structure can be in the form of a reward function obtained from Inverse Reinforcement Learning as described in Ng et al [28], Zhifei and Meng Joo [38], Brown and Niekum [4]. However, it is not always clear that the underlying true reward, in the sense of being the unique reward an expert may have used, is re-constructable or even if it can be sufficiently approximated.…”
Section: Learning From Demonstrationmentioning
confidence: 99%
“…This person may be an expert in the problem domain or, as in our work, an individual whose behavior we want to better understand. Because researchers realize the value of IRL for training artificial agents in solving difficult problems, researchers have introduced several IRL variations, including Maximum Entropy [ 38 , 39 ], Relative Entropy [ 40 , 41 ], and Bayesian IRL [ 42 , 43 ]. Existing techniques can be broadly categorized into model-based approaches [ 39 ] and model-free approaches [ 41 , 44 ].…”
Section: Introductionmentioning
confidence: 99%