2018
DOI: 10.48550/arxiv.1812.06298
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Residual Policy Learning

Abstract: We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains data-inefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvements. We study RPL in six challenging MuJoCo tasks involving partial observability, sen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
85
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 53 publications
(87 citation statements)
references
References 18 publications
0
85
0
Order By: Relevance
“…This can shorten the start-up time of the agent immensely. In their work, Silver et al [19] present an approach for a so-called "Expert Exploration". Here, the algorithm learns based on a previously imperfect solution.…”
Section: Design Of Rewarding and Exploration Strategymentioning
confidence: 99%
“…This can shorten the start-up time of the agent immensely. In their work, Silver et al [19] present an approach for a so-called "Expert Exploration". Here, the algorithm learns based on a previously imperfect solution.…”
Section: Design Of Rewarding and Exploration Strategymentioning
confidence: 99%
“…The original ResNet [23,24] drew on this motivation, with shortcut connections. Johannink et al [27] and Silver et al [48] proposed Residual Reinforcement Learning, whereby the RL problem is split into a user designed controller using engineering principles and a flexible neural network policy learned with RL. Similarly, in modeling dynamical systems, one approach is to incorporate a base parametric form informed by models from physics or biology, and only learn a neural network to fit the delta between the simple model and reality [28,36].…”
Section: Related Workmentioning
confidence: 99%
“…Recently introduced for robot control, residual reinforcement learning trains an RL controller residually on top of an imperfect, traditional controller [25], [26]. The RL algorithm leverages the traditional controller as an initialization to enable data-efficient reinforcement learning for tasks where traditional RL is intractable, such as robotic insertion tasks where rewards are sparse [29].…”
Section: B Constrained Residual Reinforcement Learningmentioning
confidence: 99%
“…This architecture, coined residual reinforcement learning (RRL), has been explored in earlier research and results into an efficient, safe and optimal control design. RRL has been introduced recently to alleviate the exploration needs and increase tractability in terms of dataefficiency for data-driven robot control [25], [26]. By applying the reinforcement learning algorithm residually on a base controller that roughly approaches the control objective, the base controller 'guides' the reinforcement learning algorithm to an approximate solution, accelerating training.…”
Section: Introductionmentioning
confidence: 99%