2018
DOI: 10.48550/arxiv.1807.03064
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Temporal Difference Learning with Neural Networks - Study of the Leakage Propagation Problem

Abstract: Temporal-Difference learning (TD) [Sutton, 1988] with function approximation can converge to solutions that are worse than those obtained by Monte-Carlo regression, even in the simple case of on-policy evaluation. To increase our understanding of the problem, we investigate the issue of approximation errors in areas of sharp discontinuities of the value function being further propagated by bootstrap updates. We show empirical evidence of this leakage propagation, and show analytically that it must occur, in a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 11 publications
0
1
0
Order By: Relevance
“…The errors, however, behave differently for the two methods: while the errors of Monte Carlo are localized to the regions with discontinuities due to directly fitting the data, TD bootstraps values from this problematic region, and thus propagates the errors even further. We refer to such errors arising due to discontinuities as leakage [16].…”
Section: Temporal-difference For Policy Evaluationmentioning
confidence: 99%
“…The errors, however, behave differently for the two methods: while the errors of Monte Carlo are localized to the regions with discontinuities due to directly fitting the data, TD bootstraps values from this problematic region, and thus propagates the errors even further. We refer to such errors arising due to discontinuities as leakage [16].…”
Section: Temporal-difference For Policy Evaluationmentioning
confidence: 99%