2020
DOI: 10.48550/arxiv.2003.01704
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Model Selection in Contextual Stochastic Bandit Problems

Abstract: We study model selection in stochastic bandit problems. Our approach relies on a master algorithm that selects its actions among candidate base algorithms. While this problem is studied for specific classes of stochastic base algorithms, our objective is to provide a method that can work with more general classes of stochastic base algorithms. We propose a master algorithm inspired by CORRAL Agarwal et al. ( 2017) and introduce a novel and generic smoothing transformation for stochastic bandit algorithms that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
35
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(36 citation statements)
references
References 2 publications
1
35
0
Order By: Relevance
“…Existing approaches to address either the stronger Objective 1 [9] or the weaker Objective 2 [16] make restrictive assumptions 2 regarding the conditioning (what we will call diversity) of the contexts. Other, more dataagnostic approaches [3,32,31,28] achieve neither of the above objectives. This leads us to ask whether we can design a universal model selection approach that is data-agnostic (other than requiring a probability model on the contexts) and achieves either Objective 1 or 2.…”
Section: Introductionmentioning
confidence: 99%
“…Existing approaches to address either the stronger Objective 1 [9] or the weaker Objective 2 [16] make restrictive assumptions 2 regarding the conditioning (what we will call diversity) of the contexts. Other, more dataagnostic approaches [3,32,31,28] achieve neither of the above objectives. This leads us to ask whether we can design a universal model selection approach that is data-agnostic (other than requiring a probability model on the contexts) and achieves either Objective 1 or 2.…”
Section: Introductionmentioning
confidence: 99%
“…[14] introduce a new family of algorithms that require access to an online oracle for square loss regression and address the case of adversarial contexts. Concurrent work of [33] solves the case when contexts / action sets are stochastic. Both works ( [14] and [33]) leverage CORRAL-type aggregation [2] of contextual bandit algorithms and achieve the optimal Õ( √ dT ǫ + d √ T ) regret bound.…”
Section: Introductionmentioning
confidence: 99%
“…Concurrent work of [33] solves the case when contexts / action sets are stochastic. Both works ( [14] and [33]) leverage CORRAL-type aggregation [2] of contextual bandit algorithms and achieve the optimal Õ( √ dT ǫ + d √ T ) regret bound. Finally, in [32], the authors present a practical master algorithm that plays base algorithms that come with a candidate regret bound that may not hold during all rounds.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations