Learning Safe Policies via Primal-Dual Methods

Paternain, Santiago; Calvo-Fullana, Miguel; Chamon, Luiz F. O.; Ribeiro, Alejandro

doi:10.1109/cdc40024.2019.9029423

Cited by 32 publications

(31 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…μi (E [ i (φ(x), y)] − c i ) ≥ P , (18) for all φ ∈ H. Note, however, that the left-hand side of ( 18) is the Lagrangian (8), i.e., (18) implies that L(φ, μ) ≥ P . In particular, this hold for the minimum of L(φ, μ), implying that D ≥ P and therefore, that strong duality holds for (P-CSL).…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Constrained Learning with Non-Convex Losses

Chamon,

Paternain,

Calvo-Fullana

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Though learning has become a core technology of modern information processing, there is now ample evidence that it can lead to biased, unsafe, and prejudiced solutions. The need to impose requirements on learning is therefore paramount, especially as it reaches critical applications in social, industrial, and medical domains. However, the non-convexity of most modern learning problems is only exacerbated by the introduction of constraints. Whereas good unconstrained solutions can often be learned using empirical risk minimization (ERM), even obtaining a model that satisfies statistical constraints can be challenging, all the more so a good one. In this paper, we overcome this issue by learning in the empirical dual domain, where constrained statistical learning problems become unconstrained, finite dimensional, and deterministic. We analyze the generalization properties of this approach by bounding the empirical duality gap, i.e., the difference between our approximate, tractable solution and the solution of the original (non-convex) statistical problem, and provide a practical constrained learning algorithm. These results establish a constrained counterpart of classical learning theory and enable the explicit use of constraints in learning. We illustrate this algorithm and theory in rate-constrained learning applications.

show abstract

Section: Discussionmentioning

confidence: 99%

“…As these systems become ubiquitous, however, so does the need to constrain their behavior to tackle problems in fairness [7]- [12], robustness [13]- [15], and safety [16]- [18]. Left untethered, learning can lead to biased, prejudiced models or systems prone to tampering (e.g., adversarial examples), and unsafe behaviors [19]- [22].…”

Section: Introductionmentioning

confidence: 99%

Constrained Learning with Non-Convex Losses

Chamon,

Paternain,

Calvo-Fullana

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, it turns out that the gradient ∇ θ p s (π) is rather difficult to compute, which is also a major challenge in chance constrained problems [20], [23]. Previous researchers in chance constrained RL usually replace ∇ θ p s (π) with the gradient of a lower bound of p s without sufficient theoretical guarantees [14], [15]. In this paper, we introduce an analytical approximated gradient with theoretical basis [23].…”

Section: B Analytical Gradient For Safe Probabilitymentioning

confidence: 99%

“…Recently, some RL researchers begin to investigate including different forms of safety constraints in RL algorithms to improve safety for real-world applications [10]- [13]. One of the most popular forms is the chance constraint, which constrains the possibility of the control policy violating the state constraint below a given level [10], [14], [15]. Chance constraint gives an intuitive and quantitative measure of the safety level of the control policy, so it is suitable to represent the safety demands in real-world systems with uncertainty.…”

Section: Introductionmentioning

confidence: 99%

“…1(a), a large penalty is prone to rapid oscillations and does not converge to a safe policy, while a small penalty cannot satisfy the constraint [16]. The second approach is the Lagrangian method [10], [15], which is widely used in constrained optimization. Actually, it can be regarded as the penalty method with an adaptive weight, which is dynamically adjusted by safety level rather than fixed.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Separated Proportional-Integral Lagrangian for Chance Constrained Reinforcement Learning

Peng¹,

Mu²,

Duan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Safety is essential for reinforcement learning (RL) applied in real-world tasks like autonomous driving. Chance constraints which guarantee the satisfaction of state constraints at a high probability are suitable to represent the requirements in real-world environment with uncertainty. Existing chance constrained RL methods like the penalty method and the Lagrangian method either exhibit periodic oscillations or cannot satisfy the constraints. In this paper, we address these shortcomings by proposing a separated proportional-integral Lagrangian (SPIL) algorithm. Taking a control perspective, we first interpret the penalty method and the Lagrangian method as proportional feedback and integral feedback control, respectively. Then, a proportional-integral Lagrangian method is proposed to steady learning process while improving safety. To prevent integral overshooting and reduce conservatism, we introduce the integral separation technique inspired by PID control. Finally, an analytical gradient of the chance constraint is utilized for model-based policy optimization. The effectiveness of SPIL is demonstrated by a narrow car-following task. Experiments indicate that compared with previous methods, SPIL improves the performance while guaranteeing safety, with a steady learning process.

show abstract

Declined placental PLAC1 expression is involved in preeclampsia

et al. 2019

View full text Add to dashboard Cite

Background: This study aimed to clarify the change of the expression of placenta-specific 1 (PLAC1) in the placenta of preeclamptic women and to explore the regulatory effects on thophoblast by PLAC1.Methods: Nineteen women with preeclampsia and 19 with normal pregnancies were recruited, and then we determined the expression of PLAC1 by immunohistochemistry (IHC) and Western blotting. To observe the effect of hypoxia on the expression of PLAC1, trophoblasts were cultured at the normoxia or hypoxia condition. Small interference of ribonucleic acid (siRNA) was used to silence PLAC1. The proliferation, migration and invasion of trophoblasts were evaluated with cell counting kit-8 and transwell analysis, and the apoptosis of trophoblast was evaluated by flow cytometry with FITC and PI staining.Results: Placental PLAC1 expression was significantly decreased in severe preeclampsia compared with control (P < .001). The expression of PLAC1 in trophoblasts was significantly decreased after treated with low oxygen concentration (P = .018). PLAC1 siRNA significantly inhibited the proliferation (P < .001), the migration (P < .001) and invasion (P < .001) of trophoblasts, but increased the apoptosis (P = .004 for Swan-71; P = .031 for Jar).Conclusions: The expression of PLAC1 was declined in preeclampsia and this inhibited the function of trophoblast, suggesting PLAC1 may play a role in the development of preeclampsia.

show abstract

Learning Safe Policies via Primal-Dual Methods

Cited by 32 publications

References 14 publications

Constrained Learning with Non-Convex Losses

Constrained Learning with Non-Convex Losses

Separated Proportional-Integral Lagrangian for Chance Constrained Reinforcement Learning

Declined placental PLAC1 expression is involved in preeclampsia

Contact Info

Product

Resources

About