2021
DOI: 10.1111/rssb.12465
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Inference of the Value Function for Reinforcement Learning in Infinite-Horizon Settings

Abstract: Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from some initial state. The focus of this paper is to construct confidence intervals (CIs) for a policy's value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 24 publications
(27 citation statements)
references
References 34 publications
0
27
0
Order By: Relevance
“…2 is upper bounded by some constant. Using similar arguments in Part 3 of the proof of Lemma 3 in Shi et al (2021), we can show that…”
Section: B21 1 Type Testmentioning
confidence: 80%
See 3 more Smart Citations
“…2 is upper bounded by some constant. Using similar arguments in Part 3 of the proof of Lemma 3 in Shi et al (2021), we can show that…”
Section: B21 1 Type Testmentioning
confidence: 80%
“…Sixth, (A6) and (A7) are commonly imposed in the statistics literature on RL (see e.g., Luckett et al, 2020). ( A6) is automatically satisfied when the behavior policy that determines Shi et al, 2021). ( A7) is a necessary condition for establishing the limiting distribution of…”
Section: Technical Conditionsmentioning
confidence: 99%
See 2 more Smart Citations
“…, where ξ π is the stationary distribution of underlying transition kernel and α is the decay rate of eigenvalues of kernel functions. Moreover, [Shi et al, 2021, Uehara et al, 2021 proposed their estimators for Q-functions derived the L 2 -norm convergence rate. [Chen and Qi, 2022] extended this result in a more general setting under weaker condition.…”
Section: Related Workmentioning
confidence: 99%