2022
DOI: 10.1080/01621459.2022.2096620
|View full text |Cite
|
Sign up to set email alerts
|

Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 34 publications
1
8
0
Order By: Relevance
“…In Theorem 4, we find that for each fraction r ∈ (0, 1], φ T (r) is the most efficient RAL estimator with its asymptotic variance matching the efficiency lower bound. This result answers an open question of efficiency in linear stochastic approximation raised by Ramprasad et al [2021] and provides evidence of the statistical optimality of the partial-sum process φ T in terms of asymptotic variance.…”
Section: Contributionsupporting
confidence: 63%
See 3 more Smart Citations
“…In Theorem 4, we find that for each fraction r ∈ (0, 1], φ T (r) is the most efficient RAL estimator with its asymptotic variance matching the efficiency lower bound. This result answers an open question of efficiency in linear stochastic approximation raised by Ramprasad et al [2021] and provides evidence of the statistical optimality of the partial-sum process φ T in terms of asymptotic variance.…”
Section: Contributionsupporting
confidence: 63%
“…However, in asynchronous reinforcement learning (RL) [Tsitsiklis, 1994, Even-Dar et al, 2003, data is generated along a single Markov chain, precluding the use of stochastic optimization methods. Inspired by resampling-based inference methods in stochastic optimization, Bootstrap-based methods have been developed for linear policy evaluation tasks [White and White, 2010, Hanna et al, 2017, Hao et al, 2021, Ramprasad et al, 2021. However, they are not suitable for nonlinear tasks, such as quantifying randomness in the optimal value function.…”
Section: Stochastic Approximation On Markovian Datamentioning
confidence: 99%
See 2 more Smart Citations
“…• Existing inference tools for online learning based on SGD or its variants (Fang, 2019;Fang et al, 2018;Ramprasad et al, 2022) are mainly designed for the finitedimensional setting. In contrast, our work deals with the infinite-dimensional setting.…”
Section: Introductionmentioning
confidence: 99%