Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

Sharma, Archit; Ahn, Michael J.; Levine, Sergey; Kumar, Vikash; Hausman, Karol; Gu, Shixiang

doi:10.15607/rss.2020.xvi.053

Cited by 31 publications

(37 citation statements)

References 41 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The concept of mutual information, which is also at the heart of empowerment based methods, has been further used to motivate several objectives for skill discovery (Florensa et al, 2017;Eysenbach et al, 2019;Achiam et al, 2018;Warde-Farley et al, 2019;Hansen et al, 2020;Sharma et al, 2020b). Recent works have shown that skills learned through mutual information can be meaningfully combined to solve downstream tasks (Eysenbach et al, 2019;Sharma et al, 2020b), even on real robots (Sharma et al, 2020a).…”

Section: Related Workmentioning

confidence: 99%

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

Choi¹,

Sharma²,

Lee³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Learning to reach goal states and learning diverse skills through mutual information (MI) maximization have been proposed as principled frameworks for self-supervised reinforcement learning, allowing agents to acquire broadly applicable multitask policies with minimal reward engineering. Starting from a simple observation that the standard goal-conditioned RL (GCRL) is encapsulated by the optimization objective of variational empowerment, we discuss how GCRL and MIbased RL can be generalized into a single family of methods, which we name variational GCRL (VGCRL), interpreting variational MI maximization, or variational empowerment, as representation learning methods that acquire functionallyaware state representations for goal reaching. This novel perspective allows us to: (1) derive simple but unexplored variants of GCRL to study how adding small representation capacity can already expand its capabilities; (2) investigate how discriminator function capacity and smoothness determine the quality of discovered skills, or latent goals, through modifying latent dimensionality and applying spectral normalization; (3) adapt techniques such as hindsight experience replay (HER) from GCRL to MI-based RL; and lastly, (4) propose a novel evaluation metric, named latent goal reaching (LGR), for comparing empowerment algorithms with different choices of latent dimensionality and discriminator parameterization. Through principled mathematical derivations and careful experimental studies, our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.

show abstract

Section: Related Workmentioning

confidence: 99%

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

Choi¹,

Sharma²,

Lee³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…It is also claimed that RL is an effective training method for these systems. The work of (Sharma et al, 2020) uses a free-reward RL algorithm to teach a robot how to travel properly within its structure and navigate, 20 hours of preparation, the machine had mastered a variety of locomotion gaits. After a short time, the RL methods yielded results.…”

Section: Reinforcement Learning In Real Environmentsmentioning

confidence: 99%

Artificial intelligence-controlled pole balancing using an Arduino board

Orellana

Chang

2021

rte

View full text Add to dashboard Cite

Automation Process (AP) is an important issue in the current digitized world and, in general, represents an increase in the quality of productivity when compared with manual control. Balance is a natural human capacity as it relates to complex operations and intelligence. Balance Control presents an extra challenge in automation processes, due to the many variables that may be involved. This work presents a physical balancing pole where a Reinforcement Learning (RL) agent can explore the environment, sense its position through accelerometers, and wirelessly communicate and eventually learns by itself how to keep the pole balanced under noise disturbance. The agent uses RL principles to explore and learn new positions and corrections that lead toward more significant rewards in terms of pole equilibrium. By using a Q-matrix, the agent explores future conditions and acquires policy information that makes it possible to maintain stability. An Arduino microcontroller processes all training and testing. With the help of sensors, servo motors, wireless communications, and artificial intelligence, components merge into a system that consistently recovers equilibrium under random position changes. The obtained results prove that through RL, an agent can learn by itself to use generic sensors, actuators and solve balancing problems even under the limitations that a microcontroller presents.

show abstract

“…Intuitively, the skill-practice distribution can alleviate this, associating states with certain skills which are important, making the skill learning process easier. We investigate this by analyzing DADS-Off (Sharma et al, 2020a) -an off-policy improved version of DADS -and resampling skills every K timesteps. This simulates K-step rollouts from arbitrary starting states, i.e.…”

Section: Online Skill Learning With a Modelmentioning

confidence: 99%

“…In particular, we can consider the intrinsic reward, which is a proxy for the diversity of skills. Since the intrinsic reward is calculated under the model, which changes and has inaccuracies, high intrinsic reward under the model is not always the best indicator, whereas it is a more reliable metric when learned from real world transitions as in Sharma et al (2020a). Also, the intrinsic reward is sampled according to an expectation given by the skill-practice distribution, so these numbers cannot be directly compared with those given by the Sharma et al (2020a) paper.…”

Section: B Further Plots For Learning Dynamicsmentioning

confidence: 99%

Reset-Free Lifelong Learning with Skill-Space Planning

Grover

Abbeel

Mordatch

2020

Preprint

View full text Add to dashboard Cite

The objective of lifelong reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks. 1

show abstract

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

Cited by 31 publications

References 41 publications

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

Artificial intelligence-controlled pole balancing using an Arduino board

Reset-Free Lifelong Learning with Skill-Space Planning

Contact Info

Product

Resources

About