2019
DOI: 10.48550/arxiv.1910.13393
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Constrained Reinforcement Learning Has Zero Duality Gap

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 9 publications
(15 citation statements)
references
References 0 publications
0
15
0
Order By: Relevance
“…• We show that adversarial training is equivalent to a stochastic optimization problem over a specific, non-atomic distribution, which we characterize using recent non-convex duality results [59,60]. Further, we show that a myriad of previous adversarial attacks reduce to particular, sub-optimal choices of this distribution.…”
Section: Introductionmentioning
confidence: 82%
See 1 more Smart Citation
“…• We show that adversarial training is equivalent to a stochastic optimization problem over a specific, non-atomic distribution, which we characterize using recent non-convex duality results [59,60]. Further, we show that a myriad of previous adversarial attacks reduce to particular, sub-optimal choices of this distribution.…”
Section: Introductionmentioning
confidence: 82%
“…and where f θ ∈ F is a fixed classifier. Roughly speaking, this problem is challenging due to the fact that we cannot compute the normalization constant in (59). Therefore, although the form of (59) indicates that the amount of mass placed on δ ∈ ∆ will be proportional to the loss (f θ (x, y)) when the data is perturbed by this perturbation δ, it's unclear how we can sample from this distribution in practice.…”
Section: E Deriving the Langevin Monte Carlo Samplermentioning
confidence: 99%
“…Theorem 2 There exists m * < ∞ such that as long as the number of branches m ≥ m * , the strong duality holds for the problem (16). Namely, P prl lin = D prl lin .…”
Section: Propositionmentioning
confidence: 99%
“…The convex duality also has important applications in machine learning. In [16], the design problem of an all-encompassing reward can be formulated as a constrained reinforcement learning problem, which is shown to have zero duality. This property gives a theoretical convergence guarantee of the primal-dual algorithm for solving this problem.…”
Section: Introductionmentioning
confidence: 99%
“…From the theoretical perspective, local and asymptotic convergence of various primal-dual algorithms have been studied in Chow et al (2017Chow et al ( , 2018; Tessler et al (2018); Yu et al (2019). Despite the objective being non-concave and the constraint set non-convex, it has been shown that the duality gap of a CMDP problem is zero (Altman, 1999;Paternain et al, 2019). Based on this property, the authors in Ding et al (2020) analyze a primal-dual natural policy gradient algorithm for a CMDP with a single constraint and present a finitetime convergence to global optimality in the tabular policy setting and with a general smooth policy class.…”
Section: Related Workmentioning
confidence: 99%