2020
DOI: 10.1080/01621459.2020.1831925
|View full text |Cite
|
Sign up to set email alerts
|

Learning When-to-Treat Policies

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 46 publications
(38 citation statements)
references
References 45 publications
0
38
0
Order By: Relevance
“…The problem of doubly robust policy evaluation in this setting has been considered by Thomas and Brunskill (2016) and Zhang, Tsiatis, Laber, and Davidian (2013). Nie, Brunskill, and Wager (2019) proposed a method for learning observational stopping rules from observational data that is both computationally feasible and robust to confounding. Obtaining a more comprehensive landscape of the problem of dynamic policy learning in observational studies would be of considerable interest.…”
Section: Discussionmentioning
confidence: 99%
“…The problem of doubly robust policy evaluation in this setting has been considered by Thomas and Brunskill (2016) and Zhang, Tsiatis, Laber, and Davidian (2013). Nie, Brunskill, and Wager (2019) proposed a method for learning observational stopping rules from observational data that is both computationally feasible and robust to confounding. Obtaining a more comprehensive landscape of the problem of dynamic policy learning in observational studies would be of considerable interest.…”
Section: Discussionmentioning
confidence: 99%
“…Zhao et al (2011) attempted to optimize the timing to initiate second-line therapy in the context of clinical trials with two-stage treatments using Q-learning, but only considered two options (i.e., immediately or delayed after induction therapy). Other work on optimizing intervention timing with a fixed number of treatment options include initiation of antiretroviral therapy in HIV (Robins et al, 2008), just-in-time adaptive interventions in mobile health (Nahum-Shani et al, 2018;Carpenter et al, 2020), and advantage doubly robust policy learning that optimizes when to treat (Nie et al, 2021). Guan et al (2019) developed a Bayesian nonparametric method that learns to recommend a regular recall time for patients with periodontal diseases.…”
Section: Why Not Use Existing Methods?mentioning
confidence: 99%
“…Off-policy evaluation (OPE) has been studied extensively across a range of different domains, from healthcare (Thapa et al, 2005;Raghu et al, 2018;Nie et al, 2019), to recommender systems (Li et al, 2010;Dudík et al, 2014;, and robotics (Kalashnikov et al, 2018). While a full survey of OPE methods is outside the scope of this article, broadly speaking we can categories OPE methods into groups based the use of importance sampling (Precup, 2000), value functions (Sutton et al, 2009;Migliavacca et al, 2010;Sutton et al, 2016;, and learned transition models (Paduraru, 2007), though a number of methods combine two or more of these components (Jiang & Li, 2015;Thomas & Brunskill, 2016;Munos et al, 2016).…”
Section: Related Workmentioning
confidence: 99%
“…The goal of this paper is to provide a standardized benchmark for evaluating OPE methods. Although considerable theoretical (Thomas & Brunskill, 2016;Swaminathan & Joachims, 2015;Jiang & Li, 2015;Wang et al, 2017; and practical progress (Gilotte et al, 2018;Nie et al, 2019;Kalashnikov et al, 2018) on OPE algorithms has been made in a range of different domains, there are few broadly accepted evaluation tasks that combine complex, high-dimensional problems commonly explored by modern deep reinforcement learning algorithms (Bellemare et al, 2013;Brockman et al, 2016) with standardized evaluation protocols and metrics. Our goal is to provide a set of tasks with a range of difficulty, excercise a variety of design properties, and provide policies with different behavioral patterns in order to establish a standardized framework for comparing OPE algorithms.…”
Section: Introductionmentioning
confidence: 99%