Preparing for the Unknown: Learning a Universal Policy with Online System Identification

Yu, Wenhao; Tan, Jie; Liu, C. Karen; Turk, Greg

doi:10.15607/rss.2017.xiii.048

Cited by 164 publications

(147 citation statements)

References 23 publications

Supporting

Mentioning

147

Contrasting

Order By: Relevance

“…Several works have demonstrated improved performance with uncertain and complex dynamics using the reinforcement learning (RL) framework and training with randomized system parameters. In [5], the authors use a recurrent neural network to explicitly learn model parameters through real time interaction with an environment; these parameters are then used to augment the observation for a standard reinforcement learning algorithm. In [6], the authors use a recurrent policy and value function in a modified deep deterministic policy gradient algorithm to learn a policy for a robotic manipulator arm that uses real camera images as observations.…”

Section: Introductionmentioning

confidence: 99%

Adaptive guidance and integrated navigation with reinforcement meta-learning

2020

View full text Add to dashboard Cite

This paper proposes a novel adaptive guidance system developed using reinforcement meta-learning with a recurrent policy and value function approximator. The use of recurrent network layers allows the deployed policy to adapt real time to environmental forces acting on the agent. We compare the performance of the DR/DV guidance law, an RL agent with a non-recurrent policy, and an RL agent with a recurrent policy in four challenging environments with unknown but highly variable dynamics. These tasks include a safe Mars landing with random engine failure and a landing on an asteroid with unknown environmental dynamics. We also demonstrate the ability of a RL meta-learning optimized policy to implement a guidance law using observations consisting of only Doppler radar altimeter readings in a Mars landing environment, and LI-DAR altimeter readings in an asteroid landing environment, thus integrating guidance and navigation.

show abstract

Section: Introductionmentioning

confidence: 99%

Adaptive guidance and integrated navigation with reinforcement meta-learning

2020

View full text Add to dashboard Cite

show abstract

“…This limits most of these works to learning simple behaviours. Making policies robust for physics adaptation [36,32,51] is also receiving interest, but these methods haven't been shown to be powerful enough to work on real robots. Using bottlenecks [52] has been shown to help domain adaptation for simple tasks like reaching.…”

Section: Transfer From Simulation To the Real Worldmentioning

confidence: 99%

Asymmetric Actor Critic for Image-Based Robot Learning

Pinto¹,

Andrychowicz²,

Welinder³

et al. 2018

Robotics: Science and Systems XIV

161

154

View full text Add to dashboard Cite

Abstract-Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we propose the Asymmetric Actor Critic, which learns a vision-based control policy while taking advantage of access to the underlying state to significantly speed up training. Concretely, our algorithm employs an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) is trained on images. We show that using these asymmetric inputs improves performance on a range of simulated tasks. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real-world transfer without training on any real-world data. Videos of these experiments can be found in www.goo.gl/b57WTs.

show abstract

“…Most methods in this class try to infer the latent input using observations from the target environment. For example, Yu et al [26] conditioned the policy on the physics parameters of the robot, and trained a separate prediction model that estimates the physics parameters given the history of observations and actions. These methods can potentially adapt to changes in environments in an online fashion.…”

Section: B Adapting Control Policy To Novel Tasksmentioning

confidence: 99%

Learning Fast Adaptation With Meta Strategy Optimization

Tan

Bai

et al. 2020

IEEE Robot. Autom. Lett.

Self Cite

View full text Add to dashboard Cite

The ability to walk in new scenarios is a key milestone on the path toward real-world applications of legged robots. In this work, we introduce Meta Strategy Optimization, a meta-learning algorithm for training policies with latent variable inputs that can quickly adapt to new scenarios with a handful of trials in the target environment. The key idea behind MSO is to expose the same adaptation process, Strategy Optimization (SO), to both the training and testing phases. This allows MSO to effectively learn locomotion skills as well as a latent space that is suitable for fast adaptation. We evaluate our method on a real quadruped robot and demonstrate successful adaptation in various scenarios, including sim-to-real transfer, walking with a weakened motor, or climbing up a slope. Furthermore, we quantitatively analyze the generalization capability of the trained policy in simulated environments. Both real and simulated experiments show that our method outperforms previous methods in adaptation to novel tasks.

show abstract

Preparing for the Unknown: Learning a Universal Policy with Online System Identification

Cited by 164 publications

References 23 publications

Adaptive guidance and integrated navigation with reinforcement meta-learning

Adaptive guidance and integrated navigation with reinforcement meta-learning

Asymmetric Actor Critic for Image-Based Robot Learning

Learning Fast Adaptation With Meta Strategy Optimization

Contact Info

Product

Resources

About