OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World

Pham, Tu-Hoa; Magistris, Giovanni De; Tachibana, Ryuki

doi:10.1109/icra.2018.8460547

Cited by 93 publications

(82 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First, by improving the capability of the corrective controller, a larger safe region can be acquired, which reduces the conflicts between safety and learning performance. Second, safety can also be incorporated in the reward function of the learning algorithm [48]. By encouraging safe behaviors, the learning-based controller tends to stay within the safe region such that less guidance is needed from the supervisor.…”

Section: B Safety and Learning Performancementioning

confidence: 99%

A General Framework to Increase Safety of Learning Algorithms for Dynamical Systems Based on Region of Attraction Estimation

et al. 2020

View full text Add to dashboard Cite

Although the state-of-the-art learning approaches exhibit impressive results for dynamical systems, only a few applications on real physical systems have been presented. One major impediment is that the intermediate policy during the training procedure may result in behaviors that are not only harmful to the system itself but also to the environment. In essence, imposing safety guarantees for learning algorithms is vital for autonomous systems acting in the real world. In this article, we propose a computationally effective and general safe learning framework, specifically for complex dynamical systems. With a proper definition of the safe region, a supervisory control strategy, which switches the actions applied on the system between the learning-based controller and a predefined corrective controller, is given. A simplified system facilitates the estimation of the safe region for the high-dimensional dynamical system. During the learning phase, the belief of the safe region is updated with the actual execution results of the corrective controller, which in turn enables the learning-based controller to have more freedom in choosing its actions. Two examples are given to demonstrate the performance of the proposed framework, one simple inverted pendulum to illustrate the online adaptation method, and one quadcopter control task to show the overall performance. Index Terms-Deep learning in robotics and automation, learning and adaptive systems, robot safety, safe reinforcement learning.

show abstract

Section: B Safety and Learning Performancementioning

confidence: 99%

A General Framework to Increase Safety of Learning Algorithms for Dynamical Systems Based on Region of Attraction Estimation

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Recently, Pham, De Magistris, and Tachibana (2018) have suggested enforcing constraints by projecting any unconstrained point onto the constrained space by solving an optimisation program that minimises the L2 distance and backpropagating through it to train the network (Amos and Kolter 2017). This approach is very time consuming as it requires solving a quadratic program (QP) in the forward pass in every training iteration and, as a result, does not scale to problems with large dimensional action spaces (Amos and Kolter 2017) seen in practical screening problems.…”

Section: Constrained Action-space Rlmentioning

confidence: 99%

“…Our RL approach is similar in spirit to Bhatia, Varakantham, and Kumar (2019), which uses a complicated variablelength iterative approximation of the L2 projection to deal with a specific subset of linear constraints faster than Pham, De Magistris, and Tachibana (2018). The type of linear constraints they can handle are constraints on the sum of sets of variables, where these sets must form a hierarchy.…”

Section: Constrained Action-space Rlmentioning

confidence: 99%

“…Typically, such mappings M have been some type of L p projection (p = 1 or 2) in the past. This projection is written as an optimization problem and enforced as a neural network layer using techniques such as OptLayer (Pham, De Magistris, and Tachibana 2018). However, such mappings are If the red point is inside the feasible region, the α-projection of the point is the point itself.…”

Section: Enforcing Linear Constraints On Continuous Action Spaces In Deep Rlmentioning

confidence: 99%

See 1 more Smart Citation

Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

Shah

Sinha

Varakantham

et al. 2020

AAAI

View full text Add to dashboard Cite

Large-scale screening for potential threats with limited resources and capacity for screening is a problem of interest at airports, seaports, and other ports of entry. Adversaries can observe screening procedures and arrive at a time when there will be gaps in screening due to limited resource capacities. To capture this game between ports and adversaries, this problem has been previously represented as a Stackelberg game, referred to as a Threat Screening Game (TSG). Given the significant complexity associated with solving TSGs and uncertainty in arrivals of customers, existing work has assumed that screenees arrive and are allocated security resources at the beginning of the time-window. In practice, screenees such as airport passengers arrive in bursts correlated with flight time and are not bound by fixed time-windows. To address this, we propose an online threat screening model in which the screening strategy is determined adaptively as a passenger arrives while satisfying a hard bound on acceptable risk of not screening a threat. To solve the online problem, we first reformulate it as a Markov Decision Process (MDP) in which the hard bound on risk translates to a constraint on the action space and then solve the resultant MDP using Deep Reinforcement Learning (DRL). To this end, we provide a novel way to efficiently enforce linear inequality constraints on the action output in DRL. We show that our solution allows us to significantly reduce screenee wait time without compromising on the risk.

show abstract

“…In our study, we trained DRL controllers in a purely modelfree fashion, i.e., no specific knowledge on the data-center operation was preencoded in the neural networks or training process, instead using only abstract state and action vectors as inputs and outputs. Note that models can be used to accelerate training when available [10], and the quality of the resulting policy ultimately depends on the quality of the models. Domain adaptation and transfer learning [13,4] constitute important steps towards the application of such modelbased DRL techniques.…”

Section: Model-free Reinforcement Learningmentioning

confidence: 99%

Reinforcement Learning Testbed for Power-Consumption Optimization

Moriyama

Magistris

Tatsubori

et al. 2018

Communications in Computer and Information Science

Self Cite

View full text Add to dashboard Cite

Common approaches to control a data-center cooling system rely on approximated system/environment models that are built upon the knowledge of mechanical cooling and electrical and thermal management. These models are difficult to design and often lead to suboptimal or unstable performance. In this paper, we show how deep reinforcement learning techniques can be used to control the cooling system of a simulated data center. In contrast to common control algorithms, those based on reinforcement learning techniques can optimize a system's performance automatically without the need of explicit model knowledge. Instead, only a reward signal needs to be designed. We evaluated the proposed algorithm on the open source simulation platform EnergyPlus. The experimental results indicate that we can achieve 22% improvement compared to a model-based control algorithm built into the EnergyPlus. To encourage the reproduction of our work as well as future research, we have also publicly released an open-source EnergyPlus wrapper interface 1 directly compatible with existing reinforcement learning frameworks.

show abstract

OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World

Cited by 93 publications

References 9 publications

A General Framework to Increase Safety of Learning Algorithms for Dynamical Systems Based on Region of Attraction Estimation

A General Framework to Increase Safety of Learning Algorithms for Dynamical Systems Based on Region of Attraction Estimation

Solving Online Threat Screening Games using Constrained Action Space Reinforcement Learning

Reinforcement Learning Testbed for Power-Consumption Optimization

Contact Info

Product

Resources

About