2021
DOI: 10.48550/arxiv.2109.13595
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Fragility of Optimized Bandit Algorithms

Abstract: Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the associated algorithms necessarily have the undesirable feature that the tail of the regret distribution behaves like that o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 17 publications
0
5
0
Order By: Relevance
“…A key assumption underlying our analysis is that the misspecified Gaussian distributions are sufficiently diffuse. The importance of diffuseness has also been highlighted in work on frequentist analysis of Thompson sampling and KL-UCB applied to bandits with independent arms (Honda and Takemura, 2013;Wager and Xu, 2021;Fan and Glynn, 2021). In contrast, our results apply to any agent and allow for generalization across arms.…”
Section: Introductionmentioning
confidence: 53%
“…A key assumption underlying our analysis is that the misspecified Gaussian distributions are sufficiently diffuse. The importance of diffuseness has also been highlighted in work on frequentist analysis of Thompson sampling and KL-UCB applied to bandits with independent arms (Honda and Takemura, 2013;Wager and Xu, 2021;Fan and Glynn, 2021). In contrast, our results apply to any agent and allow for generalization across arms.…”
Section: Introductionmentioning
confidence: 53%
“…We argue that instant-dependent consistency and light-tailed risk are incompatible. Built upon the analysis and results in Fan and Glynn (2021b), we find that a wide range of policies, including the standard UCB and SE policy, and the TS policy, suffer from heavy-tailed risk. More precisely, each of these three policies incurs a linear regret with probability Ω(poly(1/T )) = exp(−O(ln T )).…”
Section: Our Contributionsmentioning
confidence: 93%
“…Our policy design resolves two issues that create a large regret: (i) spending too much time before correctly discarding a sub-optimal arm and (ii) wrongly discarding the optimal arm due to under-estimation (pointed out by Fan and Glynn (2021b)). Despite of the simplicity of our proposed policy design, the associated proof techniques are novel and may be useful for broader analysis on regret distribution and tail risk.…”
Section: Our Contributionsmentioning
confidence: 99%
See 2 more Smart Citations