Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

Arenz, Oleg; Neumann, Gerhard

doi:10.48550/arxiv.2008.03525

Cited by 4 publications

(4 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Make further improvement MGAIL [7], InfoGAIL [35] Apply to other research question MAGAIL [63], GAIfO [70] Other generative model Diverse GAIL [76], GIRL [77] problem, recent research such as [3] tries to alleviate this problem by formulating the distributionmatching problem as an iterative lower-bound optimization problem.…”

Section: Gails Methodsmentioning

confidence: 99%

Imitation Learning: Progress, Taxonomies and Opportunities

Zheng¹,

Verma²,

Zhou³

et al. 2021

Preprint

View full text Add to dashboard Cite

Imitation learning aims to extract knowledge from human experts' demonstrations or artificially created agents in order to replicate their behaviours. Its success has been demonstrated in areas such as video games, autonomous driving, robotic simulations and object manipulation. However, this replicating process could be problematic, such as the performance is highly dependent on the demonstration quality, and most trained agents are limited to perform well in task-specific environments. In this survey, we provide a systematic review on imitation learning. We first introduce the background knowledge from development history and preliminaries, followed by presenting different taxonomies within Imitation Learning and key milestones of the field. We then detail challenges in learning strategies and present research opportunities with learning policy from suboptimal demonstration, voice instructions and other associated optimization schemes.

show abstract

Section: Gails Methodsmentioning

confidence: 99%

Imitation Learning: Progress, Taxonomies and Opportunities

Zheng¹,

Verma²,

Zhou³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Recently, some non-adversarial imitation learning approaches have been proposed. For example, offline nonadversarial imitation learning (Arenz and Neumann 2020) reduces the min-max in ValueDICE to policy iteration, which however still requires estimating density ratios and can potentially inherit the same issues from ValueDICE. By contrast, D2-Imitation avoids density ratio estimation and is more robust to different demonstrations.…”

Section: Related Workmentioning

confidence: 99%

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Sun

Devlin

Hofmann

et al. 2022

AAAI

View full text Add to dashboard Cite

Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which oper- ates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.

show abstract

“…Unlike FORM, which uses effect models (see Figure 2) that are suitable for imitation from observations, this work models quantities that are useful primarily in conjunction with actions (modeling state-action densities and/or dynamics models for GAIL augmentation). Other recently proposed methods learn reward models either purely or partially offline (Kostrikov et al, 2020;Jarrett et al, 2020;Arenz & Neumann, 2020). This approach leans on the presence of actions in the demonstrator data.…”

Section: Background and Related Workmentioning

confidence: 99%

“…Historically, the IRL procedure has been framed as matching the expected distribution over states and actions (or their features) along the imitator and demonstrator paths (Ng & Russell, 2000;Abbeel & Ng, 2004;Ziebart et al, 2008). As also noted in (Arenz & Neumann, 2020), we can express this as a divergence minimization problem:…”

Section: Inverse Reinforcement Learning From Observationsmentioning

confidence: 99%

Imitation by Predicting Observations

Jaegle¹,

Sulsky²,

Ahuja³

et al. 2021

Preprint

View full text Add to dashboard Cite

Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks while also exhibiting robustness in the presence of observations unrelated to the task. Our method, which we call FORM (for "Future Observation Reward Model") is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations, without needing ground truth actions. We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.

show abstract

Non-Adversarial Imitation Learning and its Connections to Adversarial Methods

Cited by 4 publications

References 16 publications

Imitation Learning: Progress, Taxonomies and Opportunities

Imitation Learning: Progress, Taxonomies and Opportunities

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Imitation by Predicting Observations

Contact Info

Product

Resources

About