A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms

Lee, Donghwan; He, Niao

doi:10.48550/arxiv.1912.02270

Cited by 11 publications

(21 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Assumption 4(b) is more general than the i.i.d. assumption in Szepesvári (1998); Lee and He (2019), and is similar in spirit to the covering time assumption in Even- Dar and Mansour (2003) and another related assumption in Beck and Srikant (2012).…”

Section: Application To Q-learningmentioning

confidence: 52%

See 1 more Smart Citation

Finite-Time Analysis of Asynchronous Stochastic Approximation and $Q$-Learning

Qu¹,

Wierman²

2020

Preprint

View full text Add to dashboard Cite

We consider a general asynchronous Stochastic Approximation (SA) scheme featuring a weighted infinity-norm contractive operator, and prove a bound on its finite-time convergence rate on a single trajectory. Additionally, we specialize the result to asynchronous Q-learning. The resulting bound matches the sharpest available bound for synchronous Q-learning, and improves over previous known bounds for asynchronous Q-learning.

show abstract

Section: Application To Q-learningmentioning

confidence: 52%

“…the i.i.d. assumption used in Szepesvári (1998); Lee and He (2019) and the covering time assumption used in Even-Dar and Mansour (2003)) and require every entry of the Q-function to be accurately estimated.…”

Section: Introductionmentioning

confidence: 99%

Finite-Time Analysis of Asynchronous Stochastic Approximation and $Q$-Learning

Qu¹,

Wierman²

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The Q-learning algorithm is perhaps one of the most well-known RL algorithms in the literature [44]. The asymptotic convergence of Q-learning was established in [41,9,10,19,25]. As for finitesample bounds, [4,5,43,12] study the mean-square bounds of synchronous Q-learning.…”

Section: Related Literaturementioning

confidence: 99%

“…For Q-learning with asynchronous update, [4,5] study the mean-square convergence bounds for using constant stepsize, and [16,21,20,29,27] study the high-probability bounds. When Q-learning is used along with function approximation, the asymptotic convergence and finite-sample bounds are studied in [28,13,25,46,17]. Variants of Q-learning algorithms such as double Q-learning, speedy Q-learning, and fitted Q-iteration are also studied in the literature [18,1,45,14].…”

Section: Related Literaturementioning

confidence: 99%

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

Chen¹,

Maguluri²,

Shakkottai³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous Reinforcement Learning (RL) algorithms. We do this by first reformulating the RL algorithms as Markovian Stochastic Approximation (SA) algorithms to solve fixed-point equations.We then develop a Lyapunov analysis and derive mean-square error bounds on the convergence of the Markovian SA. Based on this central result, we establish finite-sample mean-square convergence bounds for asynchronous RL algorithms such as Q-learning, n-step TD, TD(λ), and off-policy TD algorithms including V-trace. As a by-product, by analyzing the performance bounds of the TD(λ) (and n-step TD) algorithm for general λ (and n), we demonstrate a bias-variance trade-off, i.e., efficiency of bootstrapping in RL. This was first posed as an open problem in [37].

show abstract

“…For example, in Greedy-GQ (Maei et al, 2010;Wang and Zou, 2020), a control algorithm in the family of the gradient TD methods, the behavior policy is assumed to be fixed. In the convergent analysis of linear Q-learning (Melo et al, 2008;Lee and He, 2019), the behavior policy is assumed to be sufficiently close to the policy that linear Q-learning is expected to converge to.…”

Section: Control: Truncated Emphatic Expected Sarsamentioning

confidence: 99%

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Zhang¹,

Whiteson²

2021

Preprint

View full text Add to dashboard Cite

Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad (Sutton and Barto, 2018) of off-policy RL, there are still three open problems. First, the motivation for emphatic TD methods proposed by Sutton et al. ( 2016) does not align with the convergence analysis of Yu (2015). Namely, a quantity used by Sutton et al. (2016) that is expected to be essential for the convergence of emphatic TD methods is not used in the actual convergence analysis of Yu (2015). Second, followon traces typically suffer from large variance, making them hard to use in practice. Third, despite the seminal work of Yu (2015) confirming the asymptotic convergence of some emphatic TD methods for prediction problems, there is still no finite sample analysis for any emphatic TD method for prediction, much less control. In this paper, we address those three open problems simultaneously via using truncated followon traces in emphatic TD methods. Unlike the original followon traces, which depend on all previous history, truncated followon traces depend on only finite history, reducing variance and enabling the finite sample analysis of our proposed emphatic TD methods for both prediction and control.

show abstract

A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms

Cited by 11 publications

References 19 publications

Finite-Time Analysis of Asynchronous Stochastic Approximation and $Q$-Learning

Finite-Time Analysis of Asynchronous Stochastic Approximation and $Q$-Learning

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Contact Info

Product

Resources

About