Learning When-to-Treat Policies

Nie, Xinkun; Brunskill, Emma; Wager, Stefan

doi:10.1080/01621459.2020.1831925

Cited by 46 publications

(38 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The problem of doubly robust policy evaluation in this setting has been considered by Thomas and Brunskill (2016) and Zhang, Tsiatis, Laber, and Davidian (2013). Nie, Brunskill, and Wager (2019) proposed a method for learning observational stopping rules from observational data that is both computationally feasible and robust to confounding. Obtaining a more comprehensive landscape of the problem of dynamic policy learning in observational studies would be of considerable interest.…”

Section: Discussionmentioning

confidence: 99%

Policy Learning With Observational Data

Athey

Wager

2021

ECTA

Self Cite

221

211

View full text Add to dashboard Cite

In many areas, practitioners seek to use observational data to learn a treatment assignment policy that satisfies application‐specific constraints, such as budget, fairness, simplicity, or other functional form constraints. For example, policies may be restricted to take the form of decision trees based on a limited set of easily observable individual characteristics. We propose a new approach to this problem motivated by the theory of semiparametrically efficient estimation. Our method can be used to optimize either binary treatments or infinitesimal nudges to continuous treatments, and can leverage observational data where causal effects are identified using a variety of strategies, including selection on observables and instrumental variables. Given a doubly robust estimator of the causal effect of assigning everyone to treatment, we develop an algorithm for choosing whom to treat, and establish strong guarantees for the asymptotic utilitarian regret of the resulting policy.

show abstract

Section: Discussionmentioning

confidence: 99%

Policy Learning With Observational Data

Athey

Wager

2021

ECTA

Self Cite

221

211

View full text Add to dashboard Cite

show abstract

“…Zhao et al (2011) attempted to optimize the timing to initiate second-line therapy in the context of clinical trials with two-stage treatments using Q-learning, but only considered two options (i.e., immediately or delayed after induction therapy). Other work on optimizing intervention timing with a fixed number of treatment options include initiation of antiretroviral therapy in HIV (Robins et al, 2008), just-in-time adaptive interventions in mobile health (Nahum-Shani et al, 2018;Carpenter et al, 2020), and advantage doubly robust policy learning that optimizes when to treat (Nie et al, 2021). Guan et al (2019) developed a Bayesian nonparametric method that learns to recommend a regular recall time for patients with periodontal diseases.…”

Section: Why Not Use Existing Methods?mentioning

confidence: 99%

Personalized Dynamic Treatment Regimes in Continuous Time: A Bayesian Approach for Optimizing Clinical Decisions with Timing

et al. 2022

View full text Add to dashboard Cite

Accurate models of clinical actions and their impacts on disease progression are critical for estimating personalized optimal dynamic treatment regimes (DTRs) in medical/health research, especially in managing chronic conditions. Traditional statistical methods for DTRs usually focus on estimating the optimal treatment or dosage at each given medical intervention, but overlook the important question of "when this intervention should happen." We fill this gap by developing a two-step Bayesian approach to optimize clinical decisions with timing. In the first step, we build a generative model for a sequence of medical interventions-which are discrete events in continuous time-with a marked temporal point process (MTPP) where the mark is the assigned treatment or dosage. Then this clinical action model is embedded into a Bayesian joint framework where the other components model clinical observations including longitudinal medical measurements and time-to-event data conditional on treatment histories. In the second step, we propose a policy gradient method to learn the personalized optimal clinical decision that maximizes the patient survival by interacting the MTPP with the model on clinical observations while accounting for uncertainties in clinical observations learned from the posterior inference of the Bayesian joint model in the first step. A signature application of the proposed approach is to schedule follow-up visitations and assign a dosage at each visitation for patients after kidney transplantation. We evaluate our approach with comparison to alternative methods on both simulated and real-world datasets. In our experiments, the personalized decisions made by the proposed method are clinically useful: they are interpretable and successfully help improve patient survival.

show abstract

“…Off-policy evaluation (OPE) has been studied extensively across a range of different domains, from healthcare (Thapa et al, 2005;Raghu et al, 2018;Nie et al, 2019), to recommender systems (Li et al, 2010;Dudík et al, 2014;, and robotics (Kalashnikov et al, 2018). While a full survey of OPE methods is outside the scope of this article, broadly speaking we can categories OPE methods into groups based the use of importance sampling (Precup, 2000), value functions (Sutton et al, 2009;Migliavacca et al, 2010;Sutton et al, 2016;, and learned transition models (Paduraru, 2007), though a number of methods combine two or more of these components (Jiang & Li, 2015;Thomas & Brunskill, 2016;Munos et al, 2016).…”

Section: Related Workmentioning

confidence: 99%

“…The goal of this paper is to provide a standardized benchmark for evaluating OPE methods. Although considerable theoretical (Thomas & Brunskill, 2016;Swaminathan & Joachims, 2015;Jiang & Li, 2015;Wang et al, 2017; and practical progress (Gilotte et al, 2018;Nie et al, 2019;Kalashnikov et al, 2018) on OPE algorithms has been made in a range of different domains, there are few broadly accepted evaluation tasks that combine complex, high-dimensional problems commonly explored by modern deep reinforcement learning algorithms (Bellemare et al, 2013;Brockman et al, 2016) with standardized evaluation protocols and metrics. Our goal is to provide a set of tasks with a range of difficulty, excercise a variety of design properties, and provide policies with different behavioral patterns in order to establish a standardized framework for comparing OPE algorithms.…”

Section: Introductionmentioning

confidence: 99%