The Fragility of Optimized Bandit Algorithms

Lin, Faa-Jeng

doi:10.48550/arxiv.2109.13595

Cited by 2 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A key assumption underlying our analysis is that the misspecified Gaussian distributions are sufficiently diffuse. The importance of diffuseness has also been highlighted in work on frequentist analysis of Thompson sampling and KL-UCB applied to bandits with independent arms (Honda and Takemura, 2013;Wager and Xu, 2021;Fan and Glynn, 2021). In contrast, our results apply to any agent and allow for generalization across arms.…”

Section: Introductionmentioning

confidence: 53%

Gaussian Imagination in Bandit Learning

Liu¹,

Devraj²,

Roy³

et al. 2022

Preprint

View full text Add to dashboard Cite

Assuming distributions are Gaussian often facilitates computations that are otherwise intractable. We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution and a Gaussian likelihood function when applied instead to a Bernoulli bandit. Relative to an information-theoretic bound on the Bayesian regret the agent would incur when interacting with the Gaussian bandit, we bound the increase in regret when the agent interacts with the Bernoulli bandit. If the Gaussian prior distribution and likelihood function are sufficiently diffuse, this increase grows at a rate which is at most linear in the square-root of the time horizon, and thus the per-timestep increase vanishes. Our results formalize the folklore that so-called Bayesian agents remain effective when instantiated with diffuse misspecified distributions.

show abstract

Section: Introductionmentioning

confidence: 53%

Gaussian Imagination in Bandit Learning

Liu¹,

Devraj²,

Roy³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We argue that instant-dependent consistency and light-tailed risk are incompatible. Built upon the analysis and results in Fan and Glynn (2021b), we find that a wide range of policies, including the standard UCB and SE policy, and the TS policy, suffer from heavy-tailed risk. More precisely, each of these three policies incurs a linear regret with probability Ω(poly(1/T )) = exp(−O(ln T )).…”

Section: Our Contributionsmentioning

confidence: 93%

“…Our policy design resolves two issues that create a large regret: (i) spending too much time before correctly discarding a sub-optimal arm and (ii) wrongly discarding the optimal arm due to under-estimation (pointed out by Fan and Glynn (2021b)). Despite of the simplicity of our proposed policy design, the associated proof techniques are novel and may be useful for broader analysis on regret distribution and tail risk.…”

Section: Our Contributionsmentioning

confidence: 99%

“…Our work is inspired by Fan and Glynn (2021b), who first provided a rigorous formulation to analyze heavy-tailed risk for bandit algorithms and showed that for any information-theoretically optimized bandit policy, the probability of incurring a linear regret is very heavy-tailed: at least Ω(1/T ).…”

Section: Related Workmentioning

confidence: 99%

“…Theorem 1 suggests that a consistent policy must have a risk tail heavier than an exponential one. To prove Theorem 1, we use a refined version of the change of measure argument originally introduced by Fan and Glynn (2021b). We consider two environments with θ = (1/2, 1) and θ =…”

Section: Instance-dependent Consistency Causes Heavy-tailed Riskmentioning

confidence: 99%

See 2 more Smart Citations

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Multi-armed Bandits

Simchi‐Levi¹,

Zheng²,

Zhu³

2022

Preprint

View full text Add to dashboard Cite

We design new policies that ensure both worst-case optimality for expected regret and light-tailed risk for regret distribution in the stochastic multi-armed bandit problem. Recently, Fan and Glynn (2021b) showed that information-theoretically optimized bandit algorithms suffer from some serious heavy-tailed risk; that is, the worst-case probability of incurring a linear regret slowly decays at a polynomial rate of 1/T , as T (the time horizon) increases. Inspired by their results, we further show that widely used policies (e.g., UpperConfidence Bound, Thompson Sampling) also incur heavy-tailed risk; and this heavy-tailed risk actually exists for all "instance-dependent consistent" policies. With the aim to ensure safety against such heavytailed risk, starting from the two-armed bandit setting, we provide a simple policy design that (i) has the worst-case optimality for the expected regret at order Õ( √ T ) and (ii) has the worst-case tail probability of incurring a linear regret decay at an optimal exponential rate exp(−Ω( √ T )). Next, we improve the policy design and analysis to the general K-armed bandit setting. We provide explicit tail probability bound for any regret threshold under our policy design. Specifically, the worst-case probability of incurring a regret larger than x is upper bounded by exp(−Ω(x/ √ KT )). We also enhance the policy design to accommodate the "any-time" setting where T is not known a priori, and prove equivalently desired policy performances as compared to the "fixed-time" setting with known T . A brief account of numerical experiments is conducted to illustrate the theoretical findings. Our results reveal insights on the incompatibility between consistency and light-tailed risk, whereas indicate that worst-case optimality on expected regret and light-tailed risk on regret distribution are compatible.

show abstract

The Fragility of Optimized Bandit Algorithms

Cited by 2 publications

References 17 publications

Gaussian Imagination in Bandit Learning

Gaussian Imagination in Bandit Learning

A Simple and Optimal Policy Design with Safety against Heavy-tailed Risk for Multi-armed Bandits

Contact Info

Product

Resources

About