An Algorithmic Perspective on Imitation
                  Learning

Osa, Takayuki; Pajarinen, Joni; Neumann, Gerhard; Bagnell, J. Andrew; Abbeel, Pieter; Peters, Jan

doi:10.1561/9781680834116

Cited by 174 publications

(47 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This baseline directly learns a policy from an initial set of demonstrations using supervised learning. This approach is called behavioral cloning (see the survey of imitation learning given by Osa et al (2018)); in each of our experiments, we describe the policy models used. It is important to note that this approach requires fully observed demonstrations.…”

Section: Experimental Methodologymentioning

confidence: 99%

See 1 more Smart Citation

SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards

Krishnan

Garg

Liaw

et al. 2018

The International Journal of Robotics Research

View full text Add to dashboard Cite

Reinforcement Learning (RL) struggles in problems with delayed rewards, and one approach is to segment the task into sub-tasks with incremental rewards. We propose a framework called Hierarchical Inverse Reinforcement Learning (HIRL), which is a model for learning sub-task structure from demonstrations. HIRL decomposes the task into sub-tasks based on transitions that are consistent across demonstrations. These transitions are defined as changes in local linearity w.r.t to a kernel function [21]. Then, HIRL uses the inferred structure to learn reward functions local to the sub-tasks but also handle any global dependencies such as sequentiality.We have evaluated HIRL on several standard RL benchmarks: Parallel Parking with noisy dynamics, Two-Link Pendulum, 2D Noisy Motion Planning, and a Pinball environment. In the parallel parking task, we find that rewards constructed with HIRL converge to a policy with an 80% success rate in 32% fewer time-steps than those constructed with Maximum Entropy Inverse RL (MaxEnt IRL), and with partial state observation, the policies learned with IRL fail to achieve this accuracy while HIRL still converges. We further find that that the rewards learned with HIRL are robust to environment noise where they can tolerate 1 stdev. of random perturbation in the poses in the environment obstacles while maintaining roughly the same convergence rate. We find that HIRL rewards can converge up-to 6× faster than rewards constructed with IRL.

show abstract

Section: Experimental Methodologymentioning

confidence: 99%

“…This fully observed setting is typical of the literature on imitation learning (Osa et al, 2018), where one uses a function approximation to learn the state to action mapping from these samples. However, one may have a more limited access to a limited supervisor, where only the states are observed.…”

Section: Problem Setupmentioning

confidence: 99%

SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards

Krishnan

Garg

Liaw

et al. 2018

The International Journal of Robotics Research

View full text Add to dashboard Cite

show abstract

“…where i is the index of a sequence in a batch, B the batch size, y i t = τ d,i pk,t , τ d,i pa,t the vector of labels, i t the loss of time step t, L the loss of a batch with sequences of length T . The L1 loss is used instead of the L2 loss because of its robustness to outliers [33].…”

Section: Training Proceduresmentioning

confidence: 99%

Recurrent Neural Network Control of a Hybrid Dynamical Transfemoral Prosthesis with EdgeDRNN Accelerator

Gao

Gehlhar

Ames

et al. 2020

2020 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Lower leg prostheses could improve the life quality of amputees by increasing comfort and reducing energy to locomote, but currently control methods are limited in modulating behaviors based upon the human's experience. This paper describes the first steps toward learning complex controllers for dynamical robotic assistive devices. We provide the first example of behavioral cloning to control a powered transfemoral prostheses using a Gated Recurrent Unit (GRU) based recurrent neural network (RNN) running on a custom hardware accelerator that exploits temporal sparsity. The RNN is trained on data collected from the original prosthesis controller. The RNN inference is realized by a novel EdgeDRNN accelerator in real-time. Experimental results show that the RNN can replace the nominal PD controller to realize endto-end control of the AMPRO3 prosthetic leg walking on flat ground and unforeseen slopes with comparable tracking accuracy. EdgeDRNN computes the RNN about 240 times faster than real time, opening the possibility of running larger networks for more complex tasks in the future. Implementing an RNN on this real-time dynamical system with impacts sets the ground work to incorporate other learned elements of the human-prosthesis system into prosthesis control.

show abstract

“…Some papers mention the significance of evaluating predictive uncertainty to ensure the safety of the controller in the context of imitation learning. Previous papers have pointed out that the main problems in imitation learning lies in the inherent ambiguity of demonstrations (Goo and Niekum, 2019 ) or the discrepancy between training and test conditions that can lead robots to perform unexpected actions (Pomerleau, 1989 ; Osa et al, 2018 ). In practice, one possible solution is measuring the predictive uncertainty, and if the robots are uncertain about their prediction, they can stop performing actions and request that experts provide additional demonstrations (Thakur et al, 2019 ).…”

Section: Related Workmentioning

confidence: 99%

Modeling Task Uncertainty for Safe Meta-Imitation Learning

et al. 2020

View full text Add to dashboard Cite

To endow robots with the flexibility to perform a wide range of tasks in diverse and complex environments, learning their controller from experience data is a promising approach. In particular, some recent meta-learning methods are shown to solve novel tasks by leveraging their experience of performing other tasks during training. Although studies around meta-learning of robot control have worked on improving the performance, the safety issue has not been fully explored, which is also an important consideration in the deployment. In this paper, we firstly relate uncertainty on task inference with the safety in meta-learning of visual imitation, and then propose a novel framework for estimating the task uncertainty through probabilistic inference in the task-embedding space, called PETNet. We validate PETNet with a manipulation task with a simulated robot arm in terms of the task performance and uncertainty evaluation on task inference. Following the standard benchmark procedure in meta-imitation learning, we show PETNet can achieve the same or higher level of performance (success rate of novel tasks at meta-test time) as previous methods. In addition, by testing PETNet with semantically inappropriate or synthesized out-of-distribution demonstrations, PETNet shows the ability to capture the uncertainty about the tasks inherent in the given demonstrations, which allows the robot to identify situations where the controller might not perform properly. These results illustrate our proposal takes a significant step forward to the safe deployment of robot learning systems into diverse tasks and environments.

show abstract

An Algorithmic Perspective on Imitation Learning

Cited by 174 publications

References 18 publications

SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards

SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards

Recurrent Neural Network Control of a Hybrid Dynamical Transfemoral Prosthesis with EdgeDRNN Accelerator

Modeling Task Uncertainty for Safe Meta-Imitation Learning

Contact Info

Product

Resources

About