2021
DOI: 10.48550/arxiv.2102.07929
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimal Algorithms for Private Online Learning in a Stochastic Environment

Abstract: We consider two variants of private stochastic online learning. The first variant is differentially private stochastic bandits. Previously, (Sajed & Sheffet, 2019) devised the DP Successive Elimination (DP-SE) algorithm that achieves the optimal O 1≤j≤K:∆j >0 log T ∆j + K log T ǫ problem-dependent regret bound, where K is the number of arms, ∆ j is the mean reward gap of arm j, T is the time horizon, and ǫ is the required privacy parameter. However, like other elimination style algorithms, it is not an anytime… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 9 publications
0
4
0
Order By: Relevance
“…Beside the papers mentioned above, there are other related work on differentially private online learning (Guha Thakurta and Smith, 2013;Agarwal and Singh, 2017) and multi-armed bandits (Tossou and Dimitrakakis, 2017;Hu et al, 2021;Sajed and Sheffet, 2019). In the linear bandit setting with contextual information, (Shariff and Sheffet, 2018) shows an impossibility result, i.e., no algorithm can achieve a standard (ε, δ)-DP privacy guarantee while guaranteeing a sublinear regret and thus the relaxed notion of JDP is considered in their paper.…”
Section: Related Workmentioning
confidence: 99%
“…Beside the papers mentioned above, there are other related work on differentially private online learning (Guha Thakurta and Smith, 2013;Agarwal and Singh, 2017) and multi-armed bandits (Tossou and Dimitrakakis, 2017;Hu et al, 2021;Sajed and Sheffet, 2019). In the linear bandit setting with contextual information, (Shariff and Sheffet, 2018) shows an impossibility result, i.e., no algorithm can achieve a standard (ε, δ)-DP privacy guarantee while guaranteeing a sublinear regret and thus the relaxed notion of JDP is considered in their paper.…”
Section: Related Workmentioning
confidence: 99%
“…2 Garcelon et al (2020) consider stationary transition kernels, Related work. Beside the papers mentioned above, there are other related work on differentially private online learning (Guha Thakurta and Smith 2013; Agarwal and Singh 2017) and multi-armed bandits (Tossou and Dimitrakakis 2017;Hu, Huang, and Mehta 2021;Sajed and Sheffet 2019;Gajane, Urvoy, and Kaufmann 2018;Chen et al 2020). In the RL setting, in addition to Vietri et al (2020); Garcelon et al (2020) that focus on value-iteration based regret minimization algorithms under privacy constraints, Balle, Gomrokchi, and Precup (2016) considers private policy evaluation with linear function approximation.…”
Section: • We Revisit Private Optimistic Value-iteration In Tabularmentioning
confidence: 99%
“…Related work. Beside the papers mentioned above, there are other related work on differentially private online learning (Guha Thakurta and Smith, 2013;Agarwal and Singh, 2017) and multi-armed bandits (Tossou and Dimitrakakis, 2017;Hu et al, 2021;Sajed and Sheffet, 2019;Gajane et al, 2018;Chen et al, 2020). In the RL setting, in addition to Vietri et al (2020); Garcelon et al (2020) that focus on value-iteration based regret minimization algorithms under privacy constraints, Balle et al (2016) considers private policy evaluation with linear function approximation.…”
Section: Algorithmmentioning
confidence: 99%