2020
DOI: 10.48550/arxiv.2008.03525
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

Oleg Arenz,
Gerhard Neumann

Abstract: Many modern methods for imitation learning and inverse reinforcement learning, such as GAIL or AIRL, are based on an adversarial formulation. These methods apply GANs to match the expert's distribution over states and actions with the implicit state-action distribution induced by the agent's policy. However, by framing imitation learning as a saddle point problem, adversarial methods can suffer from unstable optimization, and convergence can only be shown for small policy updates. We address these problems by … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…Make further improvement MGAIL [7], InfoGAIL [35] Apply to other research question MAGAIL [63], GAIfO [70] Other generative model Diverse GAIL [76], GIRL [77] problem, recent research such as [3] tries to alleviate this problem by formulating the distributionmatching problem as an iterative lower-bound optimization problem.…”
Section: Gails Methodsmentioning
confidence: 99%
“…Make further improvement MGAIL [7], InfoGAIL [35] Apply to other research question MAGAIL [63], GAIfO [70] Other generative model Diverse GAIL [76], GIRL [77] problem, recent research such as [3] tries to alleviate this problem by formulating the distributionmatching problem as an iterative lower-bound optimization problem.…”
Section: Gails Methodsmentioning
confidence: 99%
“…Recently, some non-adversarial imitation learning approaches have been proposed. For example, offline nonadversarial imitation learning (Arenz and Neumann 2020) reduces the min-max in ValueDICE to policy iteration, which however still requires estimating density ratios and can potentially inherit the same issues from ValueDICE. By contrast, D2-Imitation avoids density ratio estimation and is more robust to different demonstrations.…”
Section: Related Workmentioning
confidence: 99%
“…Unlike FORM, which uses effect models (see Figure 2) that are suitable for imitation from observations, this work models quantities that are useful primarily in conjunction with actions (modeling state-action densities and/or dynamics models for GAIL augmentation). Other recently proposed methods learn reward models either purely or partially offline (Kostrikov et al, 2020;Jarrett et al, 2020;Arenz & Neumann, 2020). This approach leans on the presence of actions in the demonstrator data.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Historically, the IRL procedure has been framed as matching the expected distribution over states and actions (or their features) along the imitator and demonstrator paths (Ng & Russell, 2000;Abbeel & Ng, 2004;Ziebart et al, 2008). As also noted in (Arenz & Neumann, 2020), we can express this as a divergence minimization problem:…”
Section: Inverse Reinforcement Learning From Observationsmentioning
confidence: 99%