Optimal Algorithms for Private Online Learning in a Stochastic Environment

Hu, Bingshan; Huang, Zhiming; Mehta, Nishant A.

doi:10.48550/arxiv.2102.07929

Cited by 4 publications

(4 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Beside the papers mentioned above, there are other related work on differentially private online learning (Guha Thakurta and Smith, 2013;Agarwal and Singh, 2017) and multi-armed bandits (Tossou and Dimitrakakis, 2017;Hu et al, 2021;Sajed and Sheffet, 2019). In the linear bandit setting with contextual information, (Shariff and Sheffet, 2018) shows an impossibility result, i.e., no algorithm can achieve a standard (ε, δ)-DP privacy guarantee while guaranteeing a sublinear regret and thus the relaxed notion of JDP is considered in their paper.…”

Section: Related Workmentioning

confidence: 99%

Differentially Private Reinforcement Learning with Linear Function Approximation

Zhou¹

2022

Preprint

View full text Add to dashboard Cite

Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized services, where users' sensitive and private information needs to be protected, we study regret minimization in finite horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP). Compared to existing private RL algorithms that work only on tabular finite-state, finite-actions MDPs, we take the first step towards privacy-preserving learning in MDPs with large state and action spaces. Specifically, we consider MDPs with linear function approximation (in particular linear mixture MDPs) under the notion of joint differential privacy (JDP), where the RL agent is responsible for protecting users' sensitive data. We design two private RL algorithms that are based on value iteration and policy optimization, respectively, and show that they enjoy sub-linear regret performance while guaranteeing privacy protection. Moreover, the regret bounds are independent of the number of states, and scale at most logarithmically with the number of actions, making the algorithms suitable for privacy protection in nowadays large scale personalized services. Our results are achieved via a general procedure for learning in linear mixture MDPs under changing regularizers, which not only generalizes previous results for non-private learning, but also serves as a building block for general private reinforcement learning.

show abstract

Section: Related Workmentioning

confidence: 99%

Differentially Private Reinforcement Learning with Linear Function Approximation

Zhou¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…2 Garcelon et al (2020) consider stationary transition kernels, Related work. Beside the papers mentioned above, there are other related work on differentially private online learning (Guha Thakurta and Smith 2013; Agarwal and Singh 2017) and multi-armed bandits (Tossou and Dimitrakakis 2017;Hu, Huang, and Mehta 2021;Sajed and Sheffet 2019;Gajane, Urvoy, and Kaufmann 2018;Chen et al 2020). In the RL setting, in addition to Vietri et al (2020); Garcelon et al (2020) that focus on value-iteration based regret minimization algorithms under privacy constraints, Balle, Gomrokchi, and Precup (2016) considers private policy evaluation with linear function approximation.…”

Section: • We Revisit Private Optimistic Value-iteration In Tabularmentioning

confidence: 99%

Differentially Private Regret Minimization in Episodic Markov Decision Processes

Chowdhury¹,

Zhou²

2022

AAAI

View full text Add to dashboard Cite

We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems, where protecting users' sensitive and private information is becoming paramount. We consider two variants of DP -- joint DP (JDP), where a centralized agent is responsible for protecting users' sensitive data and local DP (LDP), where information needs to be protected directly on the user side. We first propose two general frameworks -- one for policy optimization and another for value iteration -- for designing private, optimistic RL algorithms. We then instantiate these frameworks with suitable privacy mechanisms to satisfy JDP and LDP requirements, and simultaneously obtain sublinear regret guarantees. The regret bounds show that under JDP, the cost of privacy is only a lower order additive term, while for a stronger privacy protection under LDP, the cost suffered is multiplicative. Finally, the regret bounds are obtained by a unified analysis, which, we believe, can be extended beyond tabular MDPs.

show abstract

“…Related work. Beside the papers mentioned above, there are other related work on differentially private online learning (Guha Thakurta and Smith, 2013;Agarwal and Singh, 2017) and multi-armed bandits (Tossou and Dimitrakakis, 2017;Hu et al, 2021;Sajed and Sheffet, 2019;Gajane et al, 2018;Chen et al, 2020). In the RL setting, in addition to Vietri et al (2020); Garcelon et al (2020) that focus on value-iteration based regret minimization algorithms under privacy constraints, Balle et al (2016) considers private policy evaluation with linear function approximation.…”

Section: Algorithmmentioning

confidence: 99%

Differentially Private Regret Minimization in Episodic Markov Decision Processes

Chowdhury¹,

Zhou²

2021

Preprint

View full text Add to dashboard Cite

We study regret minimization in finite horizon tabular Markov decision processes (MDPs) under the constraints of differential privacy (DP). This is motivated by the widespread applications of reinforcement learning (RL) in real-world sequential decision making problems, where protecting users' sensitive and private information is becoming paramount. We consider two variants of DP -joint DP (JDP), where a centralized agent is responsible for protecting users' sensitive data and local DP (LDP), where information needs to be protected directly on the user side. We first propose two general frameworks -one for policy optimization and another for value iteration -for designing private, optimistic RL algorithms. We then instantiate these frameworks with suitable privacy mechanisms to satisfy JDP and LDP requirements, and simultaneously obtain sublinear regret guarantees. The regret bounds show that under JDP, the cost of privacy is only a lower order additive term, while for a stronger privacy protection under LDP, the cost suffered is multiplicative. Finally, the regret bounds are obtained by a unified analysis, which, we believe, can be extended beyond tabular MDPs.

show abstract

Optimal Algorithms for Private Online Learning in a Stochastic Environment

Cited by 4 publications

References 9 publications

Differentially Private Reinforcement Learning with Linear Function Approximation

Differentially Private Reinforcement Learning with Linear Function Approximation

Differentially Private Regret Minimization in Episodic Markov Decision Processes

Differentially Private Regret Minimization in Episodic Markov Decision Processes

Contact Info

Product

Resources

About