2019
DOI: 10.48550/arxiv.1905.08293
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Issues concerning realizability of Blackwell optimal policies in reinforcement learning

Nicholas Denis

Abstract: N -discount optimality was introduced as a hierarchical form of policy-and value-function optimality, with Blackwell optimality lying at the top level of the hierarchy [17,3]. We formalize notions of myopic discount factors, value functions and policies in terms of Blackwell optimality in MDPs, and we provide a novel concept of regret, called Blackwell regret, which measures the regret compared to a Blackwell optimal policy. Our main analysis focuses on long horizon MDPs with sparse rewards. We show that selec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 7 publications
0
1
0
Order By: Relevance
“…Intuitively, the Blackwell optimality claims that upon considering sufficiently far into the future via γ ≥ γ Bw , there is no policy better than the Blackwell optimal policies (Denis, 2019). For more exposition, see Puterman (1994, Ch 5.4.3).…”
Section: Additional Related Workmentioning
confidence: 99%
“…Intuitively, the Blackwell optimality claims that upon considering sufficiently far into the future via γ ≥ γ Bw , there is no policy better than the Blackwell optimal policies (Denis, 2019). For more exposition, see Puterman (1994, Ch 5.4.3).…”
Section: Additional Related Workmentioning
confidence: 99%