A General Theory of the Stochastic Linear Bandit and Its Applications

Hamidi, Nima; Bayati, Mohsen

doi:10.48550/arxiv.2002.05152

Cited by 1 publication

(6 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We note that the optimal dependence on d in both the upper and lower bounds are novel, which holds under the low dimensional regime d = O(log(T )/ log log(T )), and relies on distributional assumptions on the contexts that relate the expected instant regret to the second moment of the arm parameters estimation error. Further, the elliptical potential lemma [36,Lemma 19.4], which is the main tool for the analysis of LinUCB [1,37,55,24,37], does not lead to the O(log(T )) upper bound for Tr-LinUCB, and a tailored analysis is required to show that information accumulates at a linear rate for each arm.…”

Section: Our Contributionsmentioning

confidence: 99%

“…the optimal regret is O(1), achieved by the Greedy algorithm [8, Corollary 1] and the LinUCB algorithm [24,Remark 8.4], [55]. We note that if d is fixed, and X has a continuous component that has a bounded density, then the margin condition (i.e., α = 1) holds, and thus it has a wider applicability.…”

Section: More On Stochastic Linear Banditsmentioning

confidence: 99%

“…As discussed in the introduction, the exploration of the popular LinUCB algorithm [39,1,36,24,55] is excessive, which leads to its suboptimal performance. We propose to stop the LinUCB algorithm early, and perform pure exploitation afterwards; we call the proposed algorithm "Tr-LinUCB", where "Tr" is short for "Truncated".…”

Section: The Proposed Tr-linucb Algorithmmentioning

confidence: 99%

“…For more general linear bandits (see Subsection 1.2), "optimism in the face of uncertainty" is a popular design principle, which, for each t 1, chooses an arm A t ∈ [K] that maximizes an upper bound UCB t (k) on the potential reward θ k X t [4,16,45,39,1,24,55]. Among this family, the LinUCB algorithm in [1] is perhaps the best known, and is near minimax optimal [36, Chapter 24].…”

Section: Introductionmentioning

confidence: 99%

“…Among this family, the LinUCB algorithm in [1] is perhaps the best known, and is near minimax optimal [36, Chapter 24]. In [24], in the framework under consideration, the LinUCB algorithm is shown to have a O(log 2 (T )) regret, and it was not clear whether the log(T ) gap between this upper bound and the optimal rate, achieved by the OLS algorithm, does exist or is an artifact of the proof techniques in [24].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Truncated LinUCB for Stochastic Linear Bandits

Song¹,

Zhou²

2022

Preprint

View full text Add to dashboard Cite

This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed d-dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension d and time horizon T , due to its over-exploration. A truncated version of LinUCB is proposed and termed "Tr-LinUCB", which follows LinUCB up to a truncation time S and performs pure exploitation afterwards. The Tr-LinUCB algorithm is shown to achieve O(d log(T )) regret if S = Cd log(T ) for a sufficiently large constant C, and a matching lower bound is established, which shows the rate optimality of Tr-LinUCB in both d and T under a low dimensional regime. Further, if S = d log κ (T ) for some κ > 1, the loss compared to the optimal is a multiplicative log log(T ) factor, which does not depend on d. This insensitivity to overshooting in choosing the truncation time of Tr-LinUCB is of practical importance.

show abstract

Section: Our Contributionsmentioning

confidence: 99%

Section: More On Stochastic Linear Banditsmentioning

confidence: 99%