2023
DOI: 10.48550/arxiv.2301.13589
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Policy Gradient for s-Rectangular Robust Markov Decision Processes

Abstract: We present a novel robust policy gradient method (RPG) for s-rectangular robust Markov Decision Processes (MDPs). We are the first to derive the adversarial kernel in a closed form and demonstrate that it is a one-rank perturbation of the nominal kernel. This allows us to derive an RPG that is similar to the one used in non-robust MDPs, except with a robust Q-value function and an additional correction term. Both robust Q-values and correction terms are efficiently computable, thus the time complexity of our m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…In fact, except for those considered in (Mannor, Mebel, and Xu 2016;Goyal and Grand-Clement 2023) which are locally coupled, s-rectangular uncertainty sets represent the largest class of tractable RMDPs. On the other hand, if not the studies (Xu and Mannor 2010;Mannor, Mebel, and Xu 2016;Derman, Geist, and Mannor 2021;Kumar et al 2023) that treat both reward and transition uncertainty, RMDP literature has mostly focused just on transition uncertainty. We believe this is due to the greater challenge it represents, as the repercussions of transition ambiguity are epistemic and can lead to a butterfly effect: a small kernel deviation at some state can have an unpredictable effect on another state so we are no longer able to track how local kernel uncertainty propagates across the state space.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In fact, except for those considered in (Mannor, Mebel, and Xu 2016;Goyal and Grand-Clement 2023) which are locally coupled, s-rectangular uncertainty sets represent the largest class of tractable RMDPs. On the other hand, if not the studies (Xu and Mannor 2010;Mannor, Mebel, and Xu 2016;Derman, Geist, and Mannor 2021;Kumar et al 2023) that treat both reward and transition uncertainty, RMDP literature has mostly focused just on transition uncertainty. We believe this is due to the greater challenge it represents, as the repercussions of transition ambiguity are epistemic and can lead to a butterfly effect: a small kernel deviation at some state can have an unpredictable effect on another state so we are no longer able to track how local kernel uncertainty propagates across the state space.…”
Section: Related Workmentioning
confidence: 99%
“…In that respect, the robust policy gradient methods recently introduced in (Wang and Zou 2022; Kumar et al 2023;Li, Zhao, and Lan 2022) assume the uncertainty set to be rectangular. Although Wang and Zou (2022) did prove convergence in the non-rectangular case, their analysis exclusively focused on transition uncertainty while they assumed oracle access to the policy gradient.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation