2019
DOI: 10.48550/arxiv.1909.05477
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

Dexter R. R. Scobee,
S. Shankar Sastry

Abstract: While most approaches to the problem of Inverse Reinforcement Learning (IRL) focus on estimating a reward function that best explains an expert agent's policy or demonstrated behavior on a control task, it is often the case that such behavior is more succinctly described by a simple reward combined with a set of hard constraints. In this setting, the agent is attempting to maximize cumulative rewards subject to these given constraints on their behavior. We reformulate the problem of IRL on Markov Decision Proc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(8 citation statements)
references
References 12 publications
0
7
0
Order By: Relevance
“…The number of iterations η is a hyperparameter. First, the optimal policy is learned in the nominal (unconstrained) MDP M. From this nominal policy and the set of expert trajectories, one state-action pair is selected by the principle of maximum likelihood [7] and added to the set of constraints C. This is the state which has the highest likelihood under the nominal policy but does not occur in any expert trajectory. Thus, a state which is not visited by an expert but will likely be visited by an agent which is not aware of the implicit rules in the environment.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…The number of iterations η is a hyperparameter. First, the optimal policy is learned in the nominal (unconstrained) MDP M. From this nominal policy and the set of expert trajectories, one state-action pair is selected by the principle of maximum likelihood [7] and added to the set of constraints C. This is the state which has the highest likelihood under the nominal policy but does not occur in any expert trajectory. Thus, a state which is not visited by an expert but will likely be visited by an agent which is not aware of the implicit rules in the environment.…”
Section: Methodsmentioning
confidence: 99%
“…Since the set of observed trajectories is finite, the absence of a state-action pair in the observations does not necessarily imply that this pair is invalid. To infer the state-action pairs which are most likely invalid, we build on the principle of maximum likelihood constraint inference [7]. More formally, Scobee et al propose an iterative process where each iteration a constraint c * ∈ C is added to the set of constraints C. This is the constraint which, when augmented on the MDP, maximizes the likelihood of the observed trajectories.…”
Section: Constraint Inferencementioning
confidence: 99%
See 3 more Smart Citations