Residual Policy Learning

Silver, Tom; Allen, Kelsey R.; Tenenbaum, Josh; Kaelbling, Leslie Pack

doi:10.48550/arxiv.1812.06298

Cited by 53 publications

(87 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This can shorten the start-up time of the agent immensely. In their work, Silver et al [19] present an approach for a so-called "Expert Exploration". Here, the algorithm learns based on a previously imperfect solution.…”

Section: Design Of Rewarding and Exploration Strategymentioning

confidence: 99%

Temperature Control for Automated Tape Laying with Infrared Heaters Based on Reinforcement Learning

et al. 2022

View full text Add to dashboard Cite

The use of fiber-reinforced lightweight materials in the field of electromobility offers great opportunities to increase the range of electric vehicles and also enhance the functionality of the components themselves. In order to meet the demand for a high number of variants, flexible production technologies are required which can quickly adapt to different component variants and thereby avoid long setup times of the required production equipment. By applying the formflexible process of automated tape laying (ATL), it is possible to build lightweight components in a variant-flexible way. Unidirectional (UD) tapes are often used to build up lightweight structures according to a predefined load path. However, the UD tape which is used to build the components is particularly sensitive to temperature fluctuations due to its low thickness. Temperature fluctuations within the production sites as well as the warming of the tape layer and the deposit surface over longer process times have an impact on the heat flow which is infused to the tape and make an adaptive control of the tape heating indispensable. At present, several model-based control strategies are available. However, these strategies require a comprehensive understanding of the ATL system and its environment and are therefore difficult to design. With the possibility of model-free reinforcement learning, it is possible to build a temperature control system that learns the common dependencies of both the process being used and its operating environment, without the need to rely on a complete understanding of the physical interrelationships. In this paper, a reinforcement learning approach based on the deep deterministic policy gradient (DDPG) algorithm is presented, with the aim to control the temperature of an ATL endeffector based on infrared emitters. The algorithm was adapted to the thermal inertia of the system and trained in a real process environment. With only a small amount of training data, the trained DDPG agent was able to reliably maintain the ATL process temperatures within a specified tolerance range. By applying this technique, UD tape can be deposited at a consistent process temperature over longer process times without the need for a cooling system. Reducing process complexity can help to increase the prevalence of lightweight components and thus contribute to lower energy consumption of electric vehicles.

show abstract

Section: Design Of Rewarding and Exploration Strategymentioning

confidence: 99%

Temperature Control for Automated Tape Laying with Infrared Heaters Based on Reinforcement Learning

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The original ResNet [23,24] drew on this motivation, with shortcut connections. Johannink et al [27] and Silver et al [48] proposed Residual Reinforcement Learning, whereby the RL problem is split into a user designed controller using engineering principles and a flexible neural network policy learned with RL. Similarly, in modeling dynamical systems, one approach is to incorporate a base parametric form informed by models from physics or biology, and only learn a neural network to fit the delta between the simple model and reality [28,36].…”

Section: Related Workmentioning

confidence: 99%

Residual Pathway Priors for Soft Equivariance Constraints

Finzi¹,

Benton²,

Wilson³

2021

Preprint

View full text Add to dashboard Cite

There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning. We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors, guiding models towards structured solutions, while retaining the ability to capture additional complexity. Using RPPs, we construct neural network priors with inductive biases for equivariances, but without limiting flexibility. We show that RPPs are resilient to approximate or misspecified symmetries, and are as effective as fully constrained models even when symmetries are exact. We showcase the broad applicability of RPPs with dynamical systems, tabular data, and reinforcement learning. In Mujoco locomotion tasks, where contact forces and directional rewards violate strict equivariance assumptions, the RPP outperforms baseline model-free RL agents, and also improves the learned transition models for model-based RL.

show abstract

“…Recently introduced for robot control, residual reinforcement learning trains an RL controller residually on top of an imperfect, traditional controller [25], [26]. The RL algorithm leverages the traditional controller as an initialization to enable data-efficient reinforcement learning for tasks where traditional RL is intractable, such as robotic insertion tasks where rewards are sparse [29].…”

Section: B Constrained Residual Reinforcement Learningmentioning

confidence: 99%

“…This architecture, coined residual reinforcement learning (RRL), has been explored in earlier research and results into an efficient, safe and optimal control design. RRL has been introduced recently to alleviate the exploration needs and increase tractability in terms of dataefficiency for data-driven robot control [25], [26]. By applying the reinforcement learning algorithm residually on a base controller that roughly approaches the control objective, the base controller 'guides' the reinforcement learning algorithm to an approximate solution, accelerating training.…”

Section: Introductionmentioning

confidence: 99%

Adaptive control of a mechatronic system using constrained residual reinforcement learning

Staessens¹,

Lefebvre²,

Crevecoeur³

2021

Preprint

View full text Add to dashboard Cite

We propose a simple, practical and intuitive approach to improve the performance of a conventional controller in uncertain environments using deep reinforcement learning while maintaining safe operation. Our approach is motivated by the observation that conventional controllers in industrial motion control value robustness over adaptivity to deal with different operating conditions and are suboptimal as a consequence. Reinforcement learning on the other hand can optimize a control signal directly from input-output data and thus adapt to operational conditions, but lacks safety guarantees, impeding its use in industrial environments. To realize adaptive control using reinforcement learning in such conditions, we follow a residual learning methodology, where a reinforcement learning algorithm learns corrective adaptations to a base controller's output to increase optimality. We investigate how constraining the residual agent's actions enables to leverage the base controller's robustness to guarantee safe operation. We detail the algorithmic design and propose to constrain the residual actions relative to the base controller to increase the method's robustness. Building on Lyapunov stability theory, we prove stability for a broad class of mechatronic closed-loop systems. We validate our method experimentally on a slider-crank setup and investigate how the constraints affect the safety during learning and optimality after convergence.

show abstract

Residual Policy Learning

Cited by 53 publications

References 18 publications

Temperature Control for Automated Tape Laying with Infrared Heaters Based on Reinforcement Learning

Temperature Control for Automated Tape Laying with Infrared Heaters Based on Reinforcement Learning

Residual Pathway Priors for Soft Equivariance Constraints

Adaptive control of a mechatronic system using constrained residual reinforcement learning

Contact Info

Product

Resources

About