2021
DOI: 10.48550/arxiv.2105.02344
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Policy Learning with Adaptively Collected Data

Abstract: Learning optimal policies from historical data enables the gains from personalization to be realized in a wide variety of applications. The growing policy learning literature focuses on a setting where the treatment assignment policy does not adapt to the data. However, adaptive data collection is becoming more common in practice, from two primary sources: 1) data collected from adaptive experiments that are designed to improve inferential efficiency; 2) data collected from production systems that are adaptive… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
12
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(13 citation statements)
references
References 51 publications
1
12
0
Order By: Relevance
“…Without careful causal methods, this can also lead to feedback loops. Recent work has explored building causal mechanisms into SDM algorithms [35,43,33,18,56,39]. But more work is needed to infer causal mechanisms in the face of challenges described above.…”
Section: Causal Inferencementioning
confidence: 99%
“…Without careful causal methods, this can also lead to feedback loops. Recent work has explored building causal mechanisms into SDM algorithms [35,43,33,18,56,39]. But more work is needed to infer causal mechanisms in the face of challenges described above.…”
Section: Causal Inferencementioning
confidence: 99%
“…Kato [2021a,b] propose a doubly-robust estimator for o -policy evaluation with dependent samples. Zhan et al [2021] provide regret bounds for learning an optimal policy using adaptively collected data, where the probability of selecting an action is a function of past data. Zhang et al [2021a,b] study statistical inference for OLS and M-estimation with non-i.i.d.…”
Section: Related Workmentioning
confidence: 99%
“…Thus we opted for deriving a uniform concentration bound by modifying the classical uniform LLN proof. Zhan et al [2021] also derive a uniform LLN without requiring boundedness of the martingale di erence terms, but with structural assumptions on the summands related to their speci c application.…”
Section: A3 Proof Of Theorem 1 (Regret Of Oms-etc)mentioning
confidence: 99%
“…Policy learning with adaptive data. Zhan et al [58] study policy learning from contextual-bandit data by optimizing a doubly robust policy value estimator stabilized by a deterministic lower bound on IS weights. They provide regret guarantees for this algorithm based on invoking the results of Rakhlin et al [45].…”
Section: Example 2 (Classification) In the Same Setting Asmentioning
confidence: 99%
“…These are in general not comparable. Foster and Krishnamurthy [20], Zhan et al [58] use sequential L ∞ and L p covering numbers, respectively, to obtain maximal inequalities. van de Geer [55, Chapter 8] gives guarantees for ERM over nonparametric classes of controlled sequential bracketing entropy.…”
Section: Example 2 (Classification) In the Same Setting Asmentioning
confidence: 99%