Differentially Private Regret Minimization in Episodic Markov Decision Processes

Chowdhury, Sayak Ray; Zhou, Xingyu

doi:10.1609/aaai.v36i6.20588

Cited by 1 publication

(1 citation statement)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other works considered ℓ p extensions, high dimensional variants, or improvements and applications of PSCO. 4 Several works have studied the private multiarmed bandit problem (Mishra & Thakurta, 2015;Tossou & Dimitrakakis, 2017;Sajed & Sheffet, 2019;Ren et al, 2020a;Chen et al, 2020;Zhou & Tan, 2021;Dubey, 2021), the private contextual linear bandit problem (Shariff & Sheffet, 2018;Zheng et al, 2020;Han et al, 2020;Ren et al, 2020b;Garcelon et al, 2022), and the more general private reinforcement learning (Vietri et al, 2020;Garcelon et al, 2021;Chowdhury & Zhou, 2022a) problem, in both local and centralized models of privacy. The regret gap between the two models (when the contexts are arbitrary, not stochastic (Han et al, 2021)) has shrunk using the intermediate sequential shuffle model (Tenenbaum et al, 2021;Chowdhury & Zhou, 2022b;Garcelon et al, 2022).…”

Section: Further Related Workmentioning

confidence: 99%

Concurrent Shuffle Differential Privacy Under Continual Observation

Tenenbaum¹,

Kaplan²,

Mansour³

et al. 2023

Preprint

View full text Add to dashboard Cite

We introduce the concurrent shuffle model of differential privacy. In this model we have multiple concurrent shufflers permuting messages from different, possibly overlapping, batches of users. Similarly to the standard (single) shuffle model, the privacy requirement is that the concatenation of all shuffled messages should be differentially private.We study the private continual summation problem (a.k.a. the counter problem) and show that the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model. Specifically, we give a summation algorithm with error Õ(n 1/(2k+1) ) with k concurrent shufflers on a sequence of length n. Furthermore, we prove that this bound is tight for any k, even if the algorithm can choose the sizes of the batches adaptively. For k = log n shufflers, the resulting error is polylogarithmic, much better than Θ(n 1/3 ) which we show is the smallest possible with a single shuffler.We use our online summation algorithm to get algorithms with improved regret bounds for the contextual linear bandit problem. In particular we get optimal Õ( √ n) regret with k = Ω(log n) concurrent shufflers.

show abstract