2023
DOI: 10.1609/aaai.v37i6.25900
|View full text |Cite
|
Sign up to set email alerts
|

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

Abstract: We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying enviro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 28 publications
(56 reference statements)
0
1
0
Order By: Relevance
“…Here the learner has access to a safe policy that can be deployed while the learner does not have sufficient knowledge of the safety constraint. This RL setting has been studied in tabular (Efroni et al, 2020;Liu et al, 2021a;Bura et al, 2022) and linear (Ding et al, 2021;Ghosh et al, 2022) MDPs. It is notable that Liu et al (2021a) make use of the optimism-pessimism principle that we developed in our earlier work (Pacchiano et al, 2021) and used in the analysis of this paper.…”
Section: Related Workmentioning
confidence: 99%
“…Here the learner has access to a safe policy that can be deployed while the learner does not have sufficient knowledge of the safety constraint. This RL setting has been studied in tabular (Efroni et al, 2020;Liu et al, 2021a;Bura et al, 2022) and linear (Ding et al, 2021;Ghosh et al, 2022) MDPs. It is notable that Liu et al (2021a) make use of the optimism-pessimism principle that we developed in our earlier work (Pacchiano et al, 2021) and used in the analysis of this paper.…”
Section: Related Workmentioning
confidence: 99%