2022
DOI: 10.1609/aaai.v36i6.20588
|View full text |Cite
|
Sign up to set email alerts
|

Differentially Private Regret Minimization in Episodic Markov Decision Processes

Abstract: We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems, where protecting users' sensitive and private information is becoming paramount. We consider two variants of DP -- joint DP (JDP), where a centralized agent is responsible for protecting users' sensitive data and local DP (LDP), where informati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…Other works considered ℓ p extensions, high dimensional variants, or improvements and applications of PSCO. 4 Several works have studied the private multiarmed bandit problem (Mishra & Thakurta, 2015;Tossou & Dimitrakakis, 2017;Sajed & Sheffet, 2019;Ren et al, 2020a;Chen et al, 2020;Zhou & Tan, 2021;Dubey, 2021), the private contextual linear bandit problem (Shariff & Sheffet, 2018;Zheng et al, 2020;Han et al, 2020;Ren et al, 2020b;Garcelon et al, 2022), and the more general private reinforcement learning (Vietri et al, 2020;Garcelon et al, 2021;Chowdhury & Zhou, 2022a) problem, in both local and centralized models of privacy. The regret gap between the two models (when the contexts are arbitrary, not stochastic (Han et al, 2021)) has shrunk using the intermediate sequential shuffle model (Tenenbaum et al, 2021;Chowdhury & Zhou, 2022b;Garcelon et al, 2022).…”
Section: Further Related Workmentioning
confidence: 99%
“…Other works considered ℓ p extensions, high dimensional variants, or improvements and applications of PSCO. 4 Several works have studied the private multiarmed bandit problem (Mishra & Thakurta, 2015;Tossou & Dimitrakakis, 2017;Sajed & Sheffet, 2019;Ren et al, 2020a;Chen et al, 2020;Zhou & Tan, 2021;Dubey, 2021), the private contextual linear bandit problem (Shariff & Sheffet, 2018;Zheng et al, 2020;Han et al, 2020;Ren et al, 2020b;Garcelon et al, 2022), and the more general private reinforcement learning (Vietri et al, 2020;Garcelon et al, 2021;Chowdhury & Zhou, 2022a) problem, in both local and centralized models of privacy. The regret gap between the two models (when the contexts are arbitrary, not stochastic (Han et al, 2021)) has shrunk using the intermediate sequential shuffle model (Tenenbaum et al, 2021;Chowdhury & Zhou, 2022b;Garcelon et al, 2022).…”
Section: Further Related Workmentioning
confidence: 99%