Shuffle Private Linear Contextual Bandits

Chowdhury, Sayak Ray; Zhou, Xingyu

doi:10.48550/arxiv.2202.05567

Cited by 2 publications

(11 citation statements)

References 14 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…, which clearly has optimal asymptotic dependence on n even without privacy. This improves over the result of Chowdhury & Zhou (2022b) who obtained regret of Õ(n 3/5 / √ ε) in the sequential shuffle model. That is, we reduce the regret to optimal, from Õ(n 3/5 / √ ε) to Õ( √ n/ √ ε), while essentially maintaining the same trust model (small number of concurrent shufflers instead of just one).…”

Section: Our Contributionssupporting

confidence: 70%

“…That is, we reduce the regret to optimal, from Õ(n 3/5 / √ ε) to Õ( √ n/ √ ε), while essentially maintaining the same trust model (small number of concurrent shufflers instead of just one). This answers positively the open question of Chowdhury & Zhou (2022b).…”

Section: Our Contributionssupporting

confidence: 58%

“…4 Several works have studied the private multiarmed bandit problem (Mishra & Thakurta, 2015;Tossou & Dimitrakakis, 2017;Sajed & Sheffet, 2019;Ren et al, 2020a;Chen et al, 2020;Zhou & Tan, 2021;Dubey, 2021), the private contextual linear bandit problem (Shariff & Sheffet, 2018;Zheng et al, 2020;Han et al, 2020;Ren et al, 2020b;Garcelon et al, 2022), and the more general private reinforcement learning (Vietri et al, 2020;Garcelon et al, 2021;Chowdhury & Zhou, 2022a) problem, in both local and centralized models of privacy. The regret gap between the two models (when the contexts are arbitrary, not stochastic (Han et al, 2021)) has shrunk using the intermediate sequential shuffle model (Tenenbaum et al, 2021;Chowdhury & Zhou, 2022b;Garcelon et al, 2022). See Section 5 for further discussion of these results for private contextual linear bandits.…”

Section: Further Related Workmentioning

confidence: 99%

“…To adapt the shuffle model to adaptive algorithms (e.g., bandits, sum estimates etc.) under continual observation, Tenenbaum et al (2021); Cheu et al (2022) and Chowdhury & Zhou (2022b) divide the users into continuous batches, and run a shuffle-DP (SDP) mechanism over each batch separately. When a new batch starts, the server selects the next shuffle mechanism (encoder and size), possibly as a function of the outputs of the previous shuffle mechanisms (i.e., it may be adaptive).…”

Section: Concurrent Shuffle Differential Privacymentioning

confidence: 99%

“…Recall that the private data of the t'th user is its value b t . In the sequential shuffle model (Tenenbaum et al, 2021;Cheu et al, 2022;Chowdhury & Zhou, 2022b), since each user participates in exactly one shuffle mechanism, we ensure (ε, δ) differential privacy of the entire algorithm by making each executed shuffle mechanism (ε, δ)-SDP.…”

Section: Concurrent Shuffle Differential Privacymentioning

confidence: 99%

See 4 more Smart Citations

Concurrent Shuffle Differential Privacy Under Continual Observation

Tenenbaum¹,

Kaplan²,

Mansour³

et al. 2023

Preprint

View full text Add to dashboard Cite

We introduce the concurrent shuffle model of differential privacy. In this model we have multiple concurrent shufflers permuting messages from different, possibly overlapping, batches of users. Similarly to the standard (single) shuffle model, the privacy requirement is that the concatenation of all shuffled messages should be differentially private.We study the private continual summation problem (a.k.a. the counter problem) and show that the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model. Specifically, we give a summation algorithm with error Õ(n 1/(2k+1) ) with k concurrent shufflers on a sequence of length n. Furthermore, we prove that this bound is tight for any k, even if the algorithm can choose the sizes of the batches adaptively. For k = log n shufflers, the resulting error is polylogarithmic, much better than Θ(n 1/3 ) which we show is the smallest possible with a single shuffler.We use our online summation algorithm to get algorithms with improved regret bounds for the contextual linear bandit problem. In particular we get optimal Õ( √ n) regret with k = Ω(log n) concurrent shufflers.

show abstract

Section: Our Contributionssupporting

confidence: 70%

Section: Our Contributionssupporting

confidence: 58%

Section: Further Related Workmentioning

confidence: 99%

Section: Concurrent Shuffle Differential Privacymentioning

confidence: 99%

Section: Concurrent Shuffle Differential Privacymentioning

confidence: 99%

See 3 more Smart Citations

Concurrent Shuffle Differential Privacy Under Continual Observation

Tenenbaum¹,

Kaplan²,

Mansour³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Distributed Linear Bandits With Differential Privacy

Li,

Zhou,

2024

IEEE Trans. Netw. Sci. Eng.

View full text Add to dashboard Cite

In this paper, we study the problem of global reward maximization with only partial distributed feedback. This problem is motivated by several real-world applications (e.g., cellular network configuration, dynamic pricing, and policy selection) where an action taken by a central entity influences a large population that contributes to the global reward. However, collecting such reward feedback from the entire population not only incurs a prohibitively high cost, but often leads to privacy concerns. To tackle this problem, we consider distributed linear bandits with differential privacy, where a subset of users from the population are selected (called clients) to participate in the learning process and the central server learns the global model from such partial feedback by iteratively aggregating these clients' local feedback in a differentially private fashion. We then propose a unified algorithmic learning framework, called differentially private distributed phased elimination (DP-DPE), which can be naturally integrated with popular differential privacy (DP) models (including central DP, local DP, and shuffle DP). Furthermore, we show that DP-DPE achieves both sublinear regret and sublinear communication cost. Interestingly, DP-DPE also achieves privacy protection "for free" in the sense that the additional cost due to privacy guarantees is a lower-order additive term. In addition, as a by-product of our techniques, the same results of "free" privacy can also be achieved for the standard differentially private linear bandits. Finally, we conduct simulations to corroborate our theoretical results and demonstrate the effectiveness of DP-DPE.

show abstract

Shuffle Private Linear Contextual Bandits

Cited by 2 publications

References 14 publications

Concurrent Shuffle Differential Privacy Under Continual Observation

Concurrent Shuffle Differential Privacy Under Continual Observation

Distributed Linear Bandits With Differential Privacy

Contact Info

Product

Resources

About