Scalable Multi-Task Imitation Learning with Autonomous Improvement

Singh, Avi; Jang, Eric; Irpan, Alexander; Kappler, Daniel; Dalal, Murtaza; Levinev, Sergey; Khansari, Mohi; Finn, Chelsea

doi:10.1109/icra40945.2020.9197020

Cited by 25 publications

(18 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent works have focused on scaling up multitask imitation learning over unstructured data [8,26]. However, these approaches typically assume that tasks are specified to the agent at test time via mechanisms like one-hot task selectors [37,43], goal images [29,11,26], or target configurations of the state space [14]. While these types of conditioning are straightforward to provide in simulation, they are often impractical to provide in open-world settings.…”

Section: Introductionmentioning

confidence: 99%

Language Conditioned Imitation Learning Over Unstructured Data

Lynch¹,

Sermanet²

2021

Robotics: Science and Systems XVII

View full text Add to dashboard Cite

Natural language is perhaps the most flexible and intuitive way for humans to communicate tasks to a robot. Prior work in imitation learning typically requires each task be specified with a task id or goal image-something that is often impractical in open-world environments. On the other hand, previous approaches in instruction following allow agent behavior to be guided by language, but typically assume structure in the observations, actuators, or language that limit their applicability to complex settings like robotics. In this work, we present a method for incorporating free-form natural language conditioning into imitation learning. Our approach learns perception from pixels, natural language understanding, and multitask continuous control end-to-end as a single neural network. Unlike prior work in imitation learning, our method is able to incorporate unlabeled and unstructured demonstration data (i.e. no task or language labels). We show this dramatically improves language conditioned performance, while reducing the cost of language annotation to less than 1% of total data. At test time, a single language conditioned visuomotor policy trained with our method can perform a wide variety of robotic manipulation skills in a 3D environment, specified only with natural language descriptions of each task (e.g. "open the drawer...now pick up the block...now press the green button...") (see video). To scale up the number of instructions an agent can follow, we propose combining text conditioned policies with large pretrained neural language models. We find this allows a policy to be robust to many out-of-distribution synonym instructions, without requiring new demonstrations. See videos of a human typing live text commands to our agent at https://groundinglanguage.github.io

show abstract

Section: Introductionmentioning

confidence: 99%

Language Conditioned Imitation Learning Over Unstructured Data

Lynch¹,

Sermanet²

2021

Robotics: Science and Systems XVII

View full text Add to dashboard Cite

show abstract

“…Multi-task Imitation Learning for Robotic Manipulation Our work falls under the broader category of imitation learning multiple robot manipulation tasks [47][24] [35]. The term "multi-task" has varying definitions across this space of literature.…”

Section: Related Work Imitation Learningmentioning

confidence: 99%

Towards More Generalizable One-shot Visual Imitation Learning

Zhao¹,

Liu²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences. One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations, such that at test time, it can directly execute a new task from just one demonstration. However, so far this framework has been limited to training on many variations of one task, and testing on other unseen but similar variations of the same task. In this work, we push for a higher level of generalization ability by investigating a more ambitious multi-task setup. We introduce a diverse suite of vision-based robot manipulation tasks, consisting of 7 tasks, a total of 61 variations, and a continuum of instances within each variation. For consistency and comparison purposes, we first train and evaluate single-task agents (as done in prior few-shot imitation work). We then study the multi-task setting, where multi-task training is followed by (i) one-shot imitation on variations within the training tasks, (ii) one-shot imitation on new tasks, and (iii) fine-tuning on new tasks. Prior state-of-theart, while performing well within some single tasks, struggles in these harder multi-task settings. To address these limitations, we propose MOSAIC (Multi-task One-Shot Imitation with self-Attention and Contrastive learning), which integrates a self-attention model architecture and a temporal contrastive module to enable better task disambiguation and more robust representation learning. Our experiments show that MOSAIC outperforms prior state of the art in learning efficiency, final performance, and learns a multi-task policy with promising generalization ability via fine-tuning on novel tasks.

show abstract

“…On the other hand, π S performed worse than the baseline in task T L . As result, π S may not be considered a zero-shot learner [Singh et al 2020, Oh et al 2017] in terms of generalizing for all 4 tasks defined, because it was not able to generalize to the unseen task T L . This answer RQ4.…”

Section: Policy Evaluation and Improvementmentioning

confidence: 99%

Learning Push Recovery Strategies for Bipedal Walking

Melo¹,

Máximo²,

Cunha³

2021

Anais Estendidos Do XIII Simpósio Brasileiro De Robótica E XVIII Simpósio Latino Americano De Robótica (SBR/LARS Estendido 2021

View full text Add to dashboard Cite

The present work provides an implementation of a Push Recovery controller that aids the walking engine used by a humanoid simulated robot. The simulation environment is the Robocup Soccer 3D Simulation League. The learned movement policies exceeded our original walking engine. In addition, we evaluated the policies and detected undesired biases. New methodologies were introduced in order to eliminate it.

show abstract

Scalable Multi-Task Imitation Learning with Autonomous Improvement

Cited by 25 publications

References 11 publications

Language Conditioned Imitation Learning Over Unstructured Data

Language Conditioned Imitation Learning Over Unstructured Data

Towards More Generalizable One-shot Visual Imitation Learning

Learning Push Recovery Strategies for Bipedal Walking

Contact Info

Product

Resources

About