2020
DOI: 10.48550/arxiv.2002.05152
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A General Theory of the Stochastic Linear Bandit and Its Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…We note that the optimal dependence on d in both the upper and lower bounds are novel, which holds under the low dimensional regime d = O(log(T )/ log log(T )), and relies on distributional assumptions on the contexts that relate the expected instant regret to the second moment of the arm parameters estimation error. Further, the elliptical potential lemma [36,Lemma 19.4], which is the main tool for the analysis of LinUCB [1,37,55,24,37], does not lead to the O(log(T )) upper bound for Tr-LinUCB, and a tailored analysis is required to show that information accumulates at a linear rate for each arm.…”
Section: Our Contributionsmentioning
confidence: 99%
See 4 more Smart Citations
“…We note that the optimal dependence on d in both the upper and lower bounds are novel, which holds under the low dimensional regime d = O(log(T )/ log log(T )), and relies on distributional assumptions on the contexts that relate the expected instant regret to the second moment of the arm parameters estimation error. Further, the elliptical potential lemma [36,Lemma 19.4], which is the main tool for the analysis of LinUCB [1,37,55,24,37], does not lead to the O(log(T )) upper bound for Tr-LinUCB, and a tailored analysis is required to show that information accumulates at a linear rate for each arm.…”
Section: Our Contributionsmentioning
confidence: 99%
“…the optimal regret is O(1), achieved by the Greedy algorithm [8, Corollary 1] and the LinUCB algorithm [24,Remark 8.4], [55]. We note that if d is fixed, and X has a continuous component that has a bounded density, then the margin condition (i.e., α = 1) holds, and thus it has a wider applicability.…”
Section: More On Stochastic Linear Banditsmentioning
confidence: 99%
See 3 more Smart Citations