2021
DOI: 10.1073/pnas.2014602118
|View full text |Cite
|
Sign up to set email alerts
|

Confidence intervals for policy evaluation in adaptive experiments

Abstract: Adaptive experimental designs can dramatically improve efficiency in randomized trials. But with adaptively collected data, common estimators based on sample means and inverse propensity-weighted means can be biased or heavy-tailed. This poses statistical challenges, in particular when the experimenter would like to test hypotheses about parameters that were not targeted by the data-collection mechanism. In this paper, we present a class of test statistics that can handle these challenges. Our approach is to a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
56
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(56 citation statements)
references
References 25 publications
0
56
0
Order By: Relevance
“…Examples include data acquisition policies 28 , methods for handling missing values in predictive tasks 29 , adjusting for selfselection in large-scale human impact studies 30 , and identifying heterogeneous treatment effects 31 . Further examples include evaluating the impact of behavioral data size and richness on predictive ability 32 ; proposing and comparing model explainability approaches 33 and alternatives 34 ; developing sampling designs from networked data 35 and unbiased treatment effect estimators in adaptive experiments 36 ; and translating legal notions of discrimination into automated algorithmic solutions 37 .…”
Section: The Role Of Academic Data Science Research In Advancing Knowledgementioning
confidence: 99%
See 1 more Smart Citation
“…Examples include data acquisition policies 28 , methods for handling missing values in predictive tasks 29 , adjusting for selfselection in large-scale human impact studies 30 , and identifying heterogeneous treatment effects 31 . Further examples include evaluating the impact of behavioral data size and richness on predictive ability 32 ; proposing and comparing model explainability approaches 33 and alternatives 34 ; developing sampling designs from networked data 35 and unbiased treatment effect estimators in adaptive experiments 36 ; and translating legal notions of discrimination into automated algorithmic solutions 37 .…”
Section: The Role Of Academic Data Science Research In Advancing Knowledgementioning
confidence: 99%
“…The use of BMOD by platforms can interfere with research efforts by masking, changing, and even overriding effects of interest. For this reason, there is a growing niche of research focused on developing unbiased training and evaluation procedures using ideas drawn from causal inference 36,66 . As noted, in observational studies, users' BBD alone is insufficient for answering causal questions about human behavior and possible feedback loops between human behaviors and algorithm learning and actions.…”
Section: New Needs For Conducting Scientific Researchmentioning
confidence: 99%
“…But MAB heuristics pose a problem for an experimenter interested in estimating the effects of all treatments: if the experimenter is quickly convinced that a particular treatment is suboptimal, she should stop assigning it in the future. As a result, the experimenter might miss out on learning about the effectiveness of good, though suboptimal, policies; the resulting inferential problems are discussed in Hadad et al (2019).…”
Section: Tempered Thompson Algorithm Within a Hierarchical Bayesian Modelmentioning
confidence: 99%
“…One worry about adaptive experimental designs is that they lead to biased inference (see for instance Hadad et al 2019). Item 4 of Theorem 1 implies, however, that this is not the case for the Tempered Thompson Algorithm in large samples.…”
Section: Inferencementioning
confidence: 99%
“…Armed with posterior predictive estimates of the welfare gain or loss distribution for each subject and each choice, can we adaptively identify when to withdraw the insurance product from these persistent losers, and thereby avoid them incurring such large welfare losses? Important recent research by Caria et al [2020], Hadad et al [2020] and Kasy and Sautmann [2019] considers this general issue. The challenges are significant, from the effects on inference about confidence intervals, to the implications for optimal sampling intensity, to the weight to be given to multiple treatment arms, and so on.…”
Section: Adaptive Welfare Evaluationmentioning
confidence: 99%