2022
DOI: 10.1287/opre.2021.2249
|View full text |Cite
|
Sign up to set email alerts
|

Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning

Abstract: Demystifying the Curse of Horizon in Offline Reinforcement Learning in Order to Break It Offline reinforcement learning (RL), where we evaluate and learn new policies using existing off-policy data, is crucial in applications where experimentation is challenging and simulation unreliable, such as medicine. It is also notoriously difficult because the similarity (density ratio) between observed trajectories and those generated by any new policy diminishes exponentially as the horizon grows, known as the curse … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
35
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 39 publications
(39 citation statements)
references
References 23 publications
0
35
0
Order By: Relevance
“…In particular, if the amount of time necessary to collect outcome Y (t) in O(t) is long, then generating a long time series would take too much time to be practically useful. If one is interested in causal effects on a long term outcome and is willing to forgo utilizing known randomization probabilities for treatment, we advocate for the marginal target parameters as described in previous work by or Kallus & Uehara (2019).…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…In particular, if the amount of time necessary to collect outcome Y (t) in O(t) is long, then generating a long time series would take too much time to be practically useful. If one is interested in causal effects on a long term outcome and is willing to forgo utilizing known randomization probabilities for treatment, we advocate for the marginal target parameters as described in previous work by or Kallus & Uehara (2019).…”
Section: Discussionmentioning
confidence: 99%
“…where R is a second order remainder that is doubly-robust, with R( Q, Q0 , g, g 0,t ) = 0 if either Q = Q0 or g = g 0,t . 2018) and Kallus & Uehara (2019). We pursue the discussion on marginal parameters in more detail in subsection 8.1 in the appendix.…”
Section: Canonical Gradient and First Order Expansion Of The Target P...mentioning
confidence: 99%
See 3 more Smart Citations