2018
DOI: 10.48550/arxiv.1804.08606
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Zero-Shot Visual Imitation

Abstract: The current dominant paradigm for imitation learning relies on strong supervision of expert actions to learn both what and how to imitate. We pursue an alternative paradigm wherein an agent first explores the world without any expert supervision and then distills its experience into a goal-conditioned skill policy with a novel forward consistency loss. In our framework, the role of the expert is only to communicate the goals (i.e., what to imitate) during inference. The learned policy is then employed to mimic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 19 publications
0
8
0
Order By: Relevance
“…One way to approach the aforementioned problem is to "learn to imitate" (as opposed to imitation learning), i.e., by doing some preprocessing, enable the agent to follow a single demonstration exactly. Two such approaches are proposed by Nair et al (2017) and Pathak et al (2018). These methods first learn an inverse dynamics model through selfsupervised exploration, and then use it to infer the demonstrator's action at each step and perform that in the environment.…”
Section: Related Workmentioning
confidence: 99%
“…One way to approach the aforementioned problem is to "learn to imitate" (as opposed to imitation learning), i.e., by doing some preprocessing, enable the agent to follow a single demonstration exactly. Two such approaches are proposed by Nair et al (2017) and Pathak et al (2018). These methods first learn an inverse dynamics model through selfsupervised exploration, and then use it to infer the demonstrator's action at each step and perform that in the environment.…”
Section: Related Workmentioning
confidence: 99%
“…We are, however, interested in environments with continuous highdimensional observation spaces. While there is extensive prior work on learning goal-conditioned policies (Kaelbling, 1993;Schaul et al, 2015;Andrychowicz et al, 2017;Held et al, 2017;Pathak et al, 2018), the reward function is often hand-crafted, limiting generality of the approaches. In the few cases where the reward is learned, the learning objective is typically tied to a pre-specified notion of visual similarity.…”
Section: Problem Formulationmentioning
confidence: 99%
“…Experiments showed that once a UPN is trained the state representation it learned can be used to construct a reward function for visually specified goals. Bridging goal-conditioned policy learning and imitation learning, Pathak et al (2018) learns a goal-conditioned policy and a dynamics model with supervised learning without expert trajectories, and present zero-shot imitation of trajectories from a sequence of images of a desired task.…”
Section: Related Workmentioning
confidence: 99%
“…A recent approach aimed to learn from observations by first learning how to imitate in a self-supervised manner, then given a task, attempt it zero-shot (Pathak et al, 2018). However, this approach requires learning in the agent's environment first rather than initially learning from the observations.…”
Section: Learning From State Observationsmentioning
confidence: 99%
“…A recent approach for overcoming these issues is to learn an initial self-supervised model for how to imitate by collecting experiences within the environment and then using this learned model to infer policies from expert observations (Pathak et al, 2018;Torabi et al, 2018a). However, unguided exploration can be risky in many real-world sce-narios and costly to obtain.…”
Section: Introductionmentioning
confidence: 99%