2021
DOI: 10.48550/arxiv.2112.12770
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Abstract: We study stochastic approximation procedures for approximately solving a d-dimensional linear fixed point equation based on observing a trajectory of length n from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order t mix d n on the squared error of the last iterate of a standard scheme, where t mix is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 22 publications
(38 reference statements)
0
4
0
Order By: Relevance
“…For stochastic gradient (SG) methods in the Euclidean setting, such bounds have been established for Polyak-Ruppert-averaged SG [MB11, GP17] and variancereduced SG algorithms [FGKS15,LMWJ20], with the sample complexity and high-order terms being improved over time. For reinforcement learning problems, such type of guarantees have been established in the • ∞ norm for temporal difference methods [KPR + 20] and Q-learning [KXWJ21] under a generative model, as well as Markovian trajectories [MPWB21,LLP21] under the ℓ 2 -norm. In the context of stochastic optimization, the paper [LMWJ20] provides fine-grained bound for ROOT-SGD with a unity pre-factor on the leading-order instancedependent term.…”
Section: Stochastic Approximation and Asymptotic Guaranteesmentioning
confidence: 99%
“…For stochastic gradient (SG) methods in the Euclidean setting, such bounds have been established for Polyak-Ruppert-averaged SG [MB11, GP17] and variancereduced SG algorithms [FGKS15,LMWJ20], with the sample complexity and high-order terms being improved over time. For reinforcement learning problems, such type of guarantees have been established in the • ∞ norm for temporal difference methods [KPR + 20] and Q-learning [KXWJ21] under a generative model, as well as Markovian trajectories [MPWB21,LLP21] under the ℓ 2 -norm. In the context of stochastic optimization, the paper [LMWJ20] provides fine-grained bound for ROOT-SGD with a unity pre-factor on the leading-order instancedependent term.…”
Section: Stochastic Approximation and Asymptotic Guaranteesmentioning
confidence: 99%
“…Constant stepsize has gained popularity recently, particularly among practitioners, due to its fast initial convergence and easy hyperparameter tuning. A growing line of works studies the convergence properties of SA under constant stepsize, establishing upper bounds on the meansquared error (MSE) (Lakshminarayanan and Szepesvári 2018;Srikant and Ying 2019;Mou et al 2020Mou et al , 2021 as well as weak convergence results (Dieuleveut, Durmus, and Bach 2020;Yu et al 2021;Huo, Chen, and Xie 2023a).…”
Section: Introductionmentioning
confidence: 99%
“…Initial analysis of least squares temporal difference learning (LSTD) date back to the work of Baird [1995], Bradtke and Barto [1996], Boyan [1999] and Nedić and Bertsekas [2003]. Since then, the finite sample performance of the algorithm has been analyzed by Lazaric et al [2012], Bhandari et al [2018], Duan et al [2021] and its behavior in the offline setting studied by Yu [2010], Li et al [2021], Mou et al [2020Mou et al [ , 2021. Tu and Recht [2018] analyze on-policy LSTD for the LQR setting.…”
Section: Fqi Introduced Bymentioning
confidence: 99%
“…Extensions to Markovian data are fairly well-understood, see e.g.,Mou et al [2021],Nagaraj et al [2020].3 Note that realizability of Q π does not imply that the rewards are linearly realizable.…”
mentioning
confidence: 99%