2017
DOI: 10.1080/01621459.2016.1155993
|View full text |Cite
|
Sign up to set email alerts
|

Interactive Q -Learning for Quantiles

Abstract: A dynamic treatment regime is a sequence of decision rules, each of which recommends treatment based on features of patient medical history such as past treatments and outcomes. Existing methods for estimating optimal dynamic treatment regimes from data optimize the mean of a response variable. However, the mean may not always be the most appropriate summary of performance. We derive estimators of decision rules for optimizing probabilities and quantiles computed with respect to the response distribution for t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
34
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(34 citation statements)
references
References 50 publications
0
34
0
Order By: Relevance
“…Then, the conditional quantile‐based optimal individualized treatment rule is defined as gτopt(x)=argmaxaAQτ(x,a),τ(0,1). For a conditional quantile‐based treatment rule g , the value function is defined as V τ ( g ) = E X [ Q τ { X , g ( X )}] and gτopt=argmaxgVτfalse(gfalse). It is noted that our defined value function is different from those recently studied in the literature . Specifically, they considered the marginal cumulative distribution function of the potential outcome, FYifalse(afalse)false(yfalse)=prfalse{Yifalse(afalse)yfalse}.…”
Section: New Optimal Treatment Estimation Framework: Robust Regressionmentioning
confidence: 99%
“…Then, the conditional quantile‐based optimal individualized treatment rule is defined as gτopt(x)=argmaxaAQτ(x,a),τ(0,1). For a conditional quantile‐based treatment rule g , the value function is defined as V τ ( g ) = E X [ Q τ { X , g ( X )}] and gτopt=argmaxgVτfalse(gfalse). It is noted that our defined value function is different from those recently studied in the literature . Specifically, they considered the marginal cumulative distribution function of the potential outcome, FYifalse(afalse)false(yfalse)=prfalse{Yifalse(afalse)yfalse}.…”
Section: New Optimal Treatment Estimation Framework: Robust Regressionmentioning
confidence: 99%
“…() and Linn et al . () (see also chapter 7 of Chakraborty and Moodie, ). Such examples are rare, however, and typically grounded in methods that do not offer a great deal of flexibility or robustness in modeling.…”
Section: Introductionmentioning
confidence: 99%
“…Examples include Murphy (2003); Robins (2004); Henderson et al (2010), and Henderson et al (2011). In Q-learning, where Q is taken from quality, the response itself is modelled at each decision time as a function of history to date, and optimal actions are determined sequentially Laber et al, 2014;Moodie et al, 2014;Wallace and Moodie, 2015;Song et al, 2015;Linn et al, 2017). A-and Q-learning are reviewed by Chakraborty and Moodie (2013) and Schulte et al (2014).…”
Section: Introductionmentioning
confidence: 99%