2020
DOI: 10.48550/arxiv.2002.03217
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Inference for Batched Bandits

Kelly W. Zhang,
Lucas Janson,
Susan A. Murphy
Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(16 citation statements)
references
References 0 publications
0
16
0
Order By: Relevance
“…Since the optimal policy is unknown, we estimate the optimal policy from the online data as π t , to infer the value of interest. As commonly assumed in the current online inference literature (see e.g., Deshpande et al, 2018;Zhang et al, 2020;Chen et al, 2020) and the bandit literature (see e.g., Chu et al, 2011;Abbasi-Yadkori et al, 2011;Bubeck and Cesa-Bianchi, 2012;Zhou, 2015), we consider the conditional mean outcome function takes a linear form, i.e., µ(x, a) = x β(a), where β(•) is a smooth function, which can be estimated via a ridge regression based on H t−1 as…”
Section: Frameworkmentioning
confidence: 75%
See 4 more Smart Citations
“…Since the optimal policy is unknown, we estimate the optimal policy from the online data as π t , to infer the value of interest. As commonly assumed in the current online inference literature (see e.g., Deshpande et al, 2018;Zhang et al, 2020;Chen et al, 2020) and the bandit literature (see e.g., Chu et al, 2011;Abbasi-Yadkori et al, 2011;Bubeck and Cesa-Bianchi, 2012;Zhou, 2015), we consider the conditional mean outcome function takes a linear form, i.e., µ(x, a) = x β(a), where β(•) is a smooth function, which can be estimated via a ridge regression based on H t−1 as…”
Section: Frameworkmentioning
confidence: 75%
“…as the number of pulls for action a, y t−1 (a) is the N t−1 (a)×1 vector of the outcomes received under action a at time t − 1, and ω is the regularization term. There are two main reasons to choose the ridge estimator instead of the ordinary least square estimator that is considered in Deshpande et al (2018); Zhang et al (2020); Chen et al (2020). First, the ridge estimator is well defined when D t−1 (a) D t−1 (a) is singular, and its bias is negligible when the time step is large.…”
Section: Frameworkmentioning
confidence: 99%
See 3 more Smart Citations