Thompson Sampling on Symmetric Alpha-Stable Bandits

Dubey, Abhimanyu; Pentland, Alex

doi:10.24963/ijcai.2019/792

Cited by 6 publications

(10 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, such assumptions may not always hold when designing decision-making algorithms for real-world complex systems. In particular, previous papers have shown that the rewards or the interactions in such systems often lead to heavy-tailed and power law distributions [12], such as modeling stock prices [6], preferential attachment in social networks [30], and online behavior on websites [25]. Thus, new methods are needed to deal with these heavy-tailed rewards in private bandits learning.…”

Section: Introductionmentioning

confidence: 99%

Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits

Tao¹,

Wu²,

Zhao³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper we study the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike the previous results which need to assume bounded reward distributions, here we mainly focus on the case the reward distribution of each arm only has (1 + v)-th moment with some v ∈ (0, 1]. In the first part, we study the problem in the central -DP model. We first provide a near-optimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we show that the instance-dependent regret bound of our improved algorithm is optimal by showing its lower bound. In the second part of the paper, we study the problem in the -LDP model. We propose an algorithm which could be seen as locally private and robust version of the SE algorithm, and show it could achieve (near) optimal rates for both instance-dependent and instance-independent regrets. All of the above results can also reveal the differences between the problem of private MAB with bounded rewards and heavy-tailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might could be used to other related problems. Finally, experimental results also support our theoretical analysis and show the effectiveness of our algorithms.

show abstract

Section: Introductionmentioning

confidence: 99%

Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits

Tao¹,

Wu²,

Zhao³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…When we model stock prices or deal with behaviour in social networks, the interactive data often lead to heavy tail and negative skewness Oja [1981]. Dubey and Pentland [2019] propose symmetric α-Thompson Sampling method to fit heavy tailed data. In our cases, we use asymmetric α-Thompson Sampling method to fit data with negative skewness distribution.…”

Section: Background Materialsmentioning

confidence: 99%

“…Considering the setting, for each arm n, the corresponding reward distribution is given by D n = S α (σ, β, µ n ) where α ∈ (1,2), σ ∈ R + are known in advance as say in Dubey and Pentland [2019] study, and µ n is unknown. Note that for symmetric α stable distribution (β = 0), E[r n ] = µ n , and hence we set a prior distribution over the variable µ n which is the expected reward for arm n. We can see that since the only unknown parameter for the reward distributions is µ n ,D n is parameterized by θ n = µ n .…”

Section: Symmetric α-Thompson Samplingmentioning

confidence: 99%

“…In view of the above problems, Dubey and Pentland [2019] have studied the symmetric α-Thompson Sampling method to provide a mode that fits the non-Gaussian data with impulsive characteristics better and make better decisions. Another alternative principle motivated by the Upper Confidence Bound (UCB) Algorithm has emerged as a practical competitor to make the best choice in the recommendation system Nguyen-Thanh et al [2019].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Thompson Sampling on Asymmetric $α$-Stable Bandits

Shi¹,

Kuruoğlu²,

Wei³

2022

Preprint

View full text Add to dashboard Cite

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric α-stable distributions and explore their applications in modelling financial and wireless data.

show abstract

“…Robust estimation has a rich history in the bandit literature. Robustness to heavy-tailed reward distributions has been extensively explored in the stochastic multi-armed setting, from the initial work of Bubeck et al [7] that proposed a UCB algorithm based on robust mean estimators, to the subsequent work of [12,42,44] on both Bayesian and frequentist algorithms for the same. Contamination in bandits has also been explored in the stochastic single-agent case, such as the work of [1] in best-armed identification under contamination and algorithms that are jointly optimal for the stochastic and nonstochastic case [34], that uses a modification of the popular EXP3 algorithm.…”

Section: Related Workmentioning

confidence: 99%

Private and Byzantine-Proof Cooperative Decision-Making

Dubey¹,

Pentland²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit, while communicating over a network with delays. The central idea in this problem is to design algorithms that can efficiently leverage communication to obtain improvements over acting in isolation. In this paper, we investigate the stochastic bandit problem under two settings -(a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine, i.e., they provide (stochastically) incorrect information. For both these problem settings, we provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) tolerant to byzantine agents. Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems. We test our algorithms on a competitive benchmark of random graphs and demonstrate their superior performance with respect to existing robust algorithms. We hope that our work serves as an important step towards creating distributed decision-making systems that maintain privacy.

show abstract

Thompson Sampling on Symmetric Alpha-Stable Bandits

Cited by 6 publications

References 10 publications

Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits

Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits

Thompson Sampling on Asymmetric $α$-Stable Bandits

Private and Byzantine-Proof Cooperative Decision-Making

Contact Info

Product

Resources

About