Asymmetric Actor Critic for Image-Based Robot Learning

Pinto, Lerrel; Andrychowicz, Marcin; Welinder, Peter; Zaremba, Wojciech; Abbeel, Pieter

doi:10.15607/rss.2018.xiv.008

Cited by 159 publications

(153 citation statements)

References 39 publications

Supporting

Mentioning

153

Contrasting

Order By: Relevance

“…While we are motivated to devise sidekick policy learning for active visual exploration, it is more generally applicable whenever an RL agent can access greater observability during training than during deployment. For example, agents may operate on first-person observations during test-time, yet have access to multiple sensors during training in simulation environments (38)(39)(40). Similarly, an active object recognition system (8,10,11,25,29) can only see its previously selected views of the object; yet if trained with CAD models, it could observe all possible views while learning.…”

Section: Resultsmentioning

confidence: 99%

Emergence of exploratory look-around behaviors through active observation completion

2019

View full text Add to dashboard Cite

† Equal contributionStandard computer vision systems assume access to intelligently captured inputs (e.g., photos from a human photographer), yet autonomously capturing good observations is a major challenge in itself. We address the problem of learning to look around: how can an agent learn to acquire informative visual observations? We propose a reinforcement learning solution, where the agent is rewarded for reducing its uncertainty about the unobserved portions of its environment. Specifically, the agent is trained to select a short sequence of glimpses after which it must infer the appearance of its full environment.To address the challenge of sparse rewards, we further introduce sidekick policy learning, which exploits the asymmetry in observability between training and test time. The proposed methods learn observation policies that not only * This manuscript has been accepted for publication in Science Robotics. This version has not undergone final editing. Please refer to the complete version of record at https://robotics.sciencemag.org/ content/4/30/eaaw6326. The manuscript may not be reproduced or used in any manner that does not fall within the fair use provisions of the perform the completion task for which they are trained, but also generalize to exhibit useful "look-around" behavior for a range of active perception tasks.

show abstract

Section: Resultsmentioning

confidence: 99%

Emergence of exploratory look-around behaviors through active observation completion

2019

View full text Add to dashboard Cite

show abstract

“…As done in [10,11,14,15,16] we use the domain parameter distribution as a prior which ensures the physical plausibility of each parameter. Note that specifying this distribution in the current state-of-the-art requires the researcher to make design decisions.…”

Section: Required Randomized Simulatorsmentioning

confidence: 99%

“…However, when transferred to real-world robotic systems, most of these methods become less attractive due to high sample complexity and a lack of explainability of state-of-the-art deep RL algorithms. As a consequence, the research field of domain randomization has recently been gaining interest [10,11,12,13,14,15,16,17]. This class of approaches promises to transfer control policies learned in simulation (source domain) to the real world (target domain) by randomizing the simulator's parameters (e.g., masses, extents, or friction coefficients) and hence train from a set of models instead of just one nominal model.…”

Section: Introductionmentioning

confidence: 99%

Assessing Transferability From Simulation to Reality for Reinforcement Learning

Muratore¹,

Gienger²,

Peters³

2021

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Learning robot control policies from physics simulations is of great interest to the robotics community as it may render the learning process faster, cheaper, and safer by alleviating the need for expensive real-world experiments. However, the direct transfer of learned behavior from simulation to reality is a major challenge. Optimizing a policy on a slightly faulty simulator can easily lead to the maximization of the 'Simulation Optimization Bias' (SOB). In this case, the optimizer exploits modeling errors of the simulator such that the resulting behavior can potentially damage the robot. We tackle this challenge by applying domain randomization, i.e., randomizing the parameters of the physics simulations during learning. We propose an algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) which uses an estimator of the SOB to formulate a stopping criterion for training. The introduced estimator quantifies the over-fitting to the set of domains experienced while training. Our experimental results in two different environments show that the new simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator, which can be applied directly to real system without any additional training on the latter.

show abstract

“…Domain adaptation methods either map both image spaces into a common one [14], [18] or map one into the other [15]. Domain randomization methods add noise to the synthetic images [21], [28], thus making the control policy robust to different textures and lighting. The second line of work is attractive due to its effectiveness and simplicity.…”

Section: Related Workmentioning

confidence: 99%

Learning to Augment Synthetic Images for Sim2Real Policy Transfer

Pashevich

Strudel

Kalevatykh

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

Vision and learning have made significant progress that could improve robotics policies for complex tasks and environments. Learning deep neural networks for image understanding, however, requires large amounts of domain-specific visual data. While collecting such data from real robots is possible, such an approach limits the scalability as learning policies typically requires thousands of trials.In this work we attempt to learn manipulation policies in simulated environments. Simulators enable scalability and provide access to the underlying world state during training. Policies learned in simulators, however, do not transfer well to real scenes given the domain gap between real and synthetic data. We follow recent work on domain randomization and augment synthetic images with sequences of random transformations. Our main contribution is to optimize the augmentation strategy for sim2real transfer and to enable domain-independent policy learning. We design an efficient search for depth image augmentations using object localization as a proxy task. Given the resulting sequence of random transformations, we use it to augment synthetic depth images during policy learning. Our augmentation strategy is policy-independent and enables policy learning with no real images. We demonstrate our approach to significantly improve accuracy on three manipulation tasks evaluated on a real robot.

show abstract

Asymmetric Actor Critic for Image-Based Robot Learning

Cited by 159 publications

References 39 publications

Emergence of exploratory look-around behaviors through active observation completion

Emergence of exploratory look-around behaviors through active observation completion

Assessing Transferability From Simulation to Reality for Reinforcement Learning

Learning to Augment Synthetic Images for Sim2Real Policy Transfer

Contact Info

Product

Resources

About