2018
DOI: 10.48550/arxiv.1808.09127
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

High-confidence error estimates for learned value functions

Touqir Sajed,
Wesley Chung,
Martha White

Abstract: Estimating the value function for a fixed policy is a fundamental problem in reinforcement learning. Policy evaluation algorithms-to estimate value functions-continue to be developed, to improve convergence rates, improve stability and handle variability, particularly for off-policy learning. To understand the properties of these algorithms, the experimenter needs high-confidence estimates of the accuracy of the learned value functions. For environments with small, finite state-spaces, like chains, the true va… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 6 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?