Issues concerning realizability of Blackwell optimal policies in reinforcement learning

Denis, Nicholas

doi:10.48550/arxiv.1905.08293

Search citation statements

Order By: Relevance

Paper Sections

Select...

Additional Related Work1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Intuitively, the Blackwell optimality claims that upon considering sufficiently far into the future via γ ≥ γ Bw , there is no policy better than the Blackwell optimal policies (Denis, 2019). For more exposition, see Puterman (1994, Ch 5.4.3).…”

Section: Additional Related Workmentioning

confidence: 99%

A nearly Blackwell-optimal policy gradient method

Dewanto¹,

Gallagher²

2021

Preprint

View full text Add to dashboard Cite

For continuing environments, reinforcement learning methods commonly maximize a discounted reward criterion with discount factor close to 1 in order to approximate the steady-state reward (the gain). However, such a criterion only considers the long-run performance, ignoring the transient behaviour. In this work, we develop a policy gradient method that optimizes the gain, then the bias (which indicates the transient performance and is important to capably select from policies with equal gain). We derive expressions that enable sampling for the gradient of the bias, and its preconditioning Fisher matrix. We further propose an algorithm that solves the corresponding bi-level optimization using a logarithmic barrier. Experimental results provide insights into the fundamental mechanisms of our proposal.Preprint. Under review.

show abstract

Section: Additional Related Workmentioning

confidence: 99%