2020
DOI: 10.48550/arxiv.2006.06790
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On Worst-case Regret of Linear Thompson Sampling

Abstract: In this paper, we consider the worst-case regret of Linear Thompson Sampling (LinTS) for the linear bandit problem. Russo and Van Roy (2014) show that the Bayesian regret of LinTS is bounded above by O(d √ T ) where T is the time horizon and d is the number of parameters. While this bound matches the minimax lower-bounds for this problem up to logarithmic factors, the existence of a similar worst-case regret bound is still unknown. The only known worst-case regret bound for LinTS, due to Agrawal and Goyal (201… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 8 publications
0
4
0
Order By: Relevance
“…Comparison with Different Adaptive Approaches It is worth noting that a similar adaptive approach has been considered in the Gaussian model with known variance (Jin et al 2021) and linear models (Hamidi and Bayati 2020). In these approaches, the posterior distribution was modeled as a Gaussian distribution and an adaptive inflation value ρ t was introduced to the scale parameter, which effectively flattened the posterior distributions.…”
Section: Thompson Sampling With Truncationmentioning
confidence: 99%
“…Comparison with Different Adaptive Approaches It is worth noting that a similar adaptive approach has been considered in the Gaussian model with known variance (Jin et al 2021) and linear models (Hamidi and Bayati 2020). In these approaches, the posterior distribution was modeled as a Gaussian distribution and an adaptive inflation value ρ t was introduced to the scale parameter, which effectively flattened the posterior distributions.…”
Section: Thompson Sampling With Truncationmentioning
confidence: 99%
“…Remark 3.2. As shown in Hamidi and Bayati (2020), the assumption that LinTS uses the true posterior distribution for Θ is crucial, as the Bayesian regret of LinTS can grow linearly for exp(Cd) rounds for some constant C > 0.…”
Section: Linear Thompson Samplingmentioning
confidence: 99%
“…Accordingly, the study of theoretical performance guarantees for Thompson sampling has gained much popularity and made significant progress in the recent literature with an emphasis on high-probability instance-dependent regret. First, regret bounds growing as square-root of time were shown for adversarial contextual bandits (Agrawal and Goyal, 2013;Russo and Van Roy, 2014;Abeille and Lazaric, 2017), succeeded by a square-root regret bound for settings with a Euclidean action set (Hamidi and Bayati, 2020) and logarithmic regret bound for stochastic contextual bandits with a shared reward parameter (Chakraborty et al, 2023). In particular, in the latter case (that the rewards of different arms share the unknown parameter), the regret of Thompson sampling can still be logarithmic with time, if the observations are noisy versions of the stochastic context vectors and the same dimension Faradonbeh, 2021, 2022a).…”
Section: Introductionmentioning
confidence: 99%