Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/792
|View full text |Cite
|
Sign up to set email alerts
|

Thompson Sampling on Symmetric Alpha-Stable Bandits

Abstract: Thompson Sampling provides an efficient technique to introduce prior knowledge in the multiarmed bandit problem, along with providing remarkable empirical performance. In this paper, we revisit the Thompson Sampling algorithm under rewards drawn from symmetric α-stable distributions, which are a class of heavy-tailed probability distributions utilized in finance and economics, in problems such as modeling stock prices and human behavior. We present an efficient framework for posterior inference, which leads to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 10 publications
1
9
0
Order By: Relevance
“…However, such assumptions may not always hold when designing decision-making algorithms for real-world complex systems. In particular, previous papers have shown that the rewards or the interactions in such systems often lead to heavy-tailed and power law distributions [12], such as modeling stock prices [6], preferential attachment in social networks [30], and online behavior on websites [25]. Thus, new methods are needed to deal with these heavy-tailed rewards in private bandits learning.…”
Section: Introductionmentioning
confidence: 99%
“…However, such assumptions may not always hold when designing decision-making algorithms for real-world complex systems. In particular, previous papers have shown that the rewards or the interactions in such systems often lead to heavy-tailed and power law distributions [12], such as modeling stock prices [6], preferential attachment in social networks [30], and online behavior on websites [25]. Thus, new methods are needed to deal with these heavy-tailed rewards in private bandits learning.…”
Section: Introductionmentioning
confidence: 99%
“…When we model stock prices or deal with behaviour in social networks, the interactive data often lead to heavy tail and negative skewness Oja [1981]. Dubey and Pentland [2019] propose symmetric α-Thompson Sampling method to fit heavy tailed data. In our cases, we use asymmetric α-Thompson Sampling method to fit data with negative skewness distribution.…”
Section: Background Materialsmentioning
confidence: 99%
“…Considering the setting, for each arm n, the corresponding reward distribution is given by D n = S α (σ, β, µ n ) where α ∈ (1,2), σ ∈ R + are known in advance as say in Dubey and Pentland [2019] study, and µ n is unknown. Note that for symmetric α stable distribution (β = 0), E[r n ] = µ n , and hence we set a prior distribution over the variable µ n which is the expected reward for arm n. We can see that since the only unknown parameter for the reward distributions is µ n ,D n is parameterized by θ n = µ n .…”
Section: Symmetric α-Thompson Samplingmentioning
confidence: 99%
See 1 more Smart Citation
“…Robust estimation has a rich history in the bandit literature. Robustness to heavy-tailed reward distributions has been extensively explored in the stochastic multi-armed setting, from the initial work of Bubeck et al [7] that proposed a UCB algorithm based on robust mean estimators, to the subsequent work of [12,42,44] on both Bayesian and frequentist algorithms for the same. Contamination in bandits has also been explored in the stochastic single-agent case, such as the work of [1] in best-armed identification under contamination and algorithms that are jointly optimal for the stochastic and nonstochastic case [34], that uses a modification of the popular EXP3 algorithm.…”
Section: Related Workmentioning
confidence: 99%