2017
DOI: 10.48550/arxiv.1709.04073
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging

Abstract: We consider d-dimensional linear stochastic approximation algorithms (LSAs) with a constant step-size and the so called Polyak-Ruppert (PR) averaging of iterates. LSAs are widely applied in machine learning and reinforcement learning (RL), where the aim is to compute an appropriate θ * ∈ R d (that is an optimum or a fixed point) using noisy data and O(d) updates per iteration. In this paper, we are motivated by the problem (in RL) of policy evaluation from experience replay using the temporal difference (TD) c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2017
2017
2017
2017

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 3 publications
0
2
0
Order By: Relevance
“…Table 1. Convergence results for gradient-based TD algorithms shown in previous work (Sutton et al, 2009b;Liu et al, 2015;Wang et al, 2017;Lakshminarayanan & Szepesvári, 2017;Dalal et al, 2017). θk stand for the Polyak-average of iterates: θk…”
Section: Resultsmentioning
confidence: 87%
See 1 more Smart Citation
“…Table 1. Convergence results for gradient-based TD algorithms shown in previous work (Sutton et al, 2009b;Liu et al, 2015;Wang et al, 2017;Lakshminarayanan & Szepesvári, 2017;Dalal et al, 2017). θk stand for the Polyak-average of iterates: θk…”
Section: Resultsmentioning
confidence: 87%
“…Wang et al (2017) studied also the same version as Liu et al (2015) but for the case of Markov noise case instead of the i.i.d assumptions. They prove that with high probability Lakshminarayanan & Szepesvári (2017) improved on the existing results by showing for the first time that E[ θk − θ 2 ] ∈ O(1/k) without projection step. However, the result still consider the Polyak-average of iterates.…”
Section: Related Work and Discussionmentioning
confidence: 89%