Zero-Shot Visual Imitation

Pathak, Deepak; Mahmoudieh, Parsa; Luo, Guanghao; Agrawal, Pulkit; Chen, Dian; Shentu, Yide; Shelhamer, Evan; Malik, Jitendra; Efros, Alexei A.; Darrell, Trevor

doi:10.48550/arxiv.1804.08606

Cited by 6 publications

(8 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One way to approach the aforementioned problem is to "learn to imitate" (as opposed to imitation learning), i.e., by doing some preprocessing, enable the agent to follow a single demonstration exactly. Two such approaches are proposed by Nair et al (2017) and Pathak et al (2018). These methods first learn an inverse dynamics model through selfsupervised exploration, and then use it to infer the demonstrator's action at each step and perform that in the environment.…”

Section: Related Workmentioning

confidence: 99%

Generative Adversarial Imitation from Observation

Torabi,

Warnell,

Stone

2018

Preprint

View full text Add to dashboard Cite

Imitation from observation (IfO) is the problem of learning directly from state-only demonstrations without having access to the demonstrator's actions. The lack of action information both distinguishes IfO from most of the literature in imitation learning, and also sets it apart as a method that may enable agents to learn from a large set of previously inapplicable resources such as internet videos. In this paper, we propose both a general framework for IfO approaches and also a new IfO approach based on generative adversarial networks called generative adversarial imitation from observation (GAIfO). We conduct experiments in two different settings: (1) when demonstrations consist of low-dimensional, manuallydefined state features, and (2) when demonstrations consist of high-dimensional, raw visual data. We demonstrate that our approach performs comparably to classical imitation learning approaches (which have access to the demonstrator's actions) and significantly outperforms existing imitation from observation methods in high-dimensional simulation environments.

show abstract

Section: Related Workmentioning

confidence: 99%

Generative Adversarial Imitation from Observation

Torabi,

Warnell,

Stone

2018

Preprint

View full text Add to dashboard Cite

show abstract

“…We are, however, interested in environments with continuous highdimensional observation spaces. While there is extensive prior work on learning goal-conditioned policies (Kaelbling, 1993;Schaul et al, 2015;Andrychowicz et al, 2017;Held et al, 2017;Pathak et al, 2018), the reward function is often hand-crafted, limiting generality of the approaches. In the few cases where the reward is learned, the learning objective is typically tied to a pre-specified notion of visual similarity.…”

Section: Problem Formulationmentioning

confidence: 99%

“…Experiments showed that once a UPN is trained the state representation it learned can be used to construct a reward function for visually specified goals. Bridging goal-conditioned policy learning and imitation learning, Pathak et al (2018) learns a goal-conditioned policy and a dynamics model with supervised learning without expert trajectories, and present zero-shot imitation of trajectories from a sequence of images of a desired task.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Control Through Non-Parametric Discriminative Rewards

Warde-Farley¹,

Wiele²,

Kulkarni³

et al. 2018

Preprint

View full text Add to dashboard Cite

Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research. We present an unsupervised learning algorithm to train agents to achieve perceptuallyspecified goals using only a stream of observations and actions. Our agent simultaneously learns a goal-conditioned policy and a goal achievement reward function that measures how similar a state is to the goal state. This dual optimization leads to a co-operative game, giving rise to a learned reward function that reflects similarity in controllable aspects of the environment instead of distance in the space of observations. We demonstrate the efficacy of our agent to learn, in an unsupervised manner, to reach a diverse set of goals on three domains -Atari, the DeepMind Control Suite and DeepMind Lab.

show abstract

“…A recent approach aimed to learn from observations by first learning how to imitate in a self-supervised manner, then given a task, attempt it zero-shot (Pathak et al, 2018). However, this approach requires learning in the agent's environment first rather than initially learning from the observations.…”

Section: Learning From State Observationsmentioning

confidence: 99%

“…A recent approach for overcoming these issues is to learn an initial self-supervised model for how to imitate by collecting experiences within the environment and then using this learned model to infer policies from expert observations (Pathak et al, 2018;Torabi et al, 2018a). However, unguided exploration can be risky in many real-world sce-narios and costly to obtain.…”

Section: Introductionmentioning

confidence: 99%

Imitating Latent Policies from Observation

Edwards,

Sahni,

Schroecker

et al. 2018

Preprint

View full text Add to dashboard Cite

In this paper, we describe a novel approach to imitation learning that infers latent policies directly from state observations. We introduce a method that characterizes the causal effects of latent actions on observations while simultaneously predicting their likelihood. We then outline an action alignment procedure that leverages a small amount of environment interactions to determine a mapping between the latent and real-world actions. We show that this corrected labeling can be used for imitating the observed behavior, even though no expert actions are given. We evaluate our approach within classic control environments and a platform game and demonstrate that it performs better than standard approaches. Code for this work is available at https://github. com/ashedwards/ILPO.

show abstract

Zero-Shot Visual Imitation

Cited by 6 publications

References 19 publications

Generative Adversarial Imitation from Observation

Generative Adversarial Imitation from Observation

Unsupervised Control Through Non-Parametric Discriminative Rewards

Imitating Latent Policies from Observation

Contact Info

Product

Resources

About