2013
DOI: 10.1109/tit.2013.2277869
|View full text |Cite
|
Sign up to set email alerts
|

Bandits With Heavy Tail

Abstract: The stochastic multiarmed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper, we examine the bandit problem under the weaker assumption that the distributions have moments of order , for some. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the trun… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

9
255
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 219 publications
(264 citation statements)
references
References 18 publications
9
255
0
Order By: Relevance
“…We conduct simulations with the following benchmarks -(i) an ε-greedy agent with linearly decreasing ε, (ii) Regular TS with Gaussian priors and a Gaussian assumption on the data (Gaussian-TS), (iii) Robust-UCB [Bubeck et al, 2013] for heavy-tailed distributions using a truncated mean estimator, and (iv) α-TS and (v) Robust α-TS, both with Q(iterations for sampling) as 50.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We conduct simulations with the following benchmarks -(i) an ε-greedy agent with linearly decreasing ε, (ii) Regular TS with Gaussian priors and a Gaussian assumption on the data (Gaussian-TS), (iii) Robust-UCB [Bubeck et al, 2013] for heavy-tailed distributions using a truncated mean estimator, and (iv) α-TS and (v) Robust α-TS, both with Q(iterations for sampling) as 50.…”
Section: Methodsmentioning
confidence: 99%
“…A version of the UCB algorithm [Auer et al, 2002] has been proposed in Bubeck et al [2013] coupled with several robust mean estimators to obtain Robust-UCB algorithms with optimal problem-dependent (i.e. dependent on individual µ k s) regret when rewards are heavy-tailed.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Theorem 3.7 shows that such constant effort is inevitable for all (ε, δ)-approximating algorithms, with a lower bound that reproduces the dependence on K and δ. The cost bound (5) also reveals the adaption of our algorithm to the input, the lower bounds of Theorem 3.9 and 3.10 exhibit the same dependence on ε, δ, and the norm. In doing so, we obtain (up to constants) tight upper and lower bounds for the (ε, δ)-complexity of computing expected values (2).…”
Section: Introductionmentioning
confidence: 64%
“…Irrespective of the specified accuracy ε and the input random variable, the cost bound (5) is at least some constant times K pq/(q−p) log δ −1 . This fixed cost comes from estimating Y −E Y 1 within our particular algorithm and can be interpreted as the price we need to pay for not knowing the statistical dispersion of Y .…”
Section: Introductionmentioning
confidence: 99%