2020
DOI: 10.48550/arxiv.2010.06324
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Balancing Constraints and Rewards with Meta-Gradient D4PG

Dan A. Calian,
Daniel J. Mankowitz,
Tom Zahavy
et al.

Abstract: Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved without violating the constraints. However, in many real-world cases, constraint violations are undesirable yet they… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…In previous work, Calian et al (2020) tune the learning rate of the Lagrange multipliers to automatically turn some constraints into soft-constraints when the agent is not able to satisfy them after a given period of time. The bootstrap constraint instead allows us to start making some progress on the main task without turning our hard constraints into soft constraints.…”
Section: Bootstrap Constraintmentioning
confidence: 99%
See 1 more Smart Citation
“…In previous work, Calian et al (2020) tune the learning rate of the Lagrange multipliers to automatically turn some constraints into soft-constraints when the agent is not able to satisfy them after a given period of time. The bootstrap constraint instead allows us to start making some progress on the main task without turning our hard constraints into soft constraints.…”
Section: Bootstrap Constraintmentioning
confidence: 99%
“…Most of these works tackle CMDPs from the perspective of Safe RL, which seeks to minimize the total regret over the cost functions throughout training (Ray, Achiam, and Amodei 2019) and focus on the single-constraint case (Zhang, Vuong, and Ross 2020;Dalal et al 2018;Calian et al 2020) or aggregate various types of events under a single constraint (Stooke, Achiam, and Abbeel 2020;Ray, Achiam, and Amodei 2019). In this work, we focus our attention on the potential of CMDPs for precise and intuitive behavior specification and work on the problem of satisfying many constraints simultaneously.…”
Section: Related Work Constrained Reinforcement Learningmentioning
confidence: 99%
“…Learning RL policy under safety constraints [12,13,7] becomes an important topic in the community due to the safety concern in real-world applications. Many methods based on constrained optimization have been developed, such as the trust region methods [5], Lagrangian methods [5,6,14], barrier methods [15,16], Lyapunov methods [4,17], etc. Another direction is based on the safety critic, where an additional value estimator is learned to predict cost, apart from the primal critic estimating the discounted return [7,18].…”
Section: Related Workmentioning
confidence: 99%
“…A wide variety of constrained reinforcement learning frameworks are proposed to solve constrained MDPs (CMDPs) [43]. They either convert a CMDP into an unconstrained min-max problem by introducing Lagrangian multipliers [12,14,[44][45][46][47][48], or seek to obtain the optimal policy by directly solving constrained optimization problems [11,13,[18][19][20][49][50][51]. However, it is hard to scale these single-agent methods to our multi-agent setting due to computational inefficiency.…”
Section: Related Workmentioning
confidence: 99%