2019
DOI: 10.48550/arxiv.1912.02270
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms

Abstract: In this paper, we introduce a unified framework for analyzing a large family of Q-learning algorithms, based on switching system perspectives and ODE-based stochastic approximation. We show that the nonlinear ODE models associated with these Q-learning algorithms can be formulated as switched linear systems, and analyze their asymptotic stability by leveraging existing switching system theories. Our approach provides the first O.D.E. analysis of the asymptotic convergence of various Q-learning algorithms, incl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(21 citation statements)
references
References 19 publications
0
21
0
Order By: Relevance
“…Assumption 4(b) is more general than the i.i.d. assumption in Szepesvári (1998); Lee and He (2019), and is similar in spirit to the covering time assumption in Even- Dar and Mansour (2003) and another related assumption in Beck and Srikant (2012).…”
Section: Application To Q-learningmentioning
confidence: 52%
See 1 more Smart Citation
“…Assumption 4(b) is more general than the i.i.d. assumption in Szepesvári (1998); Lee and He (2019), and is similar in spirit to the covering time assumption in Even- Dar and Mansour (2003) and another related assumption in Beck and Srikant (2012).…”
Section: Application To Q-learningmentioning
confidence: 52%
“…the i.i.d. assumption used in Szepesvári (1998); Lee and He (2019) and the covering time assumption used in Even-Dar and Mansour (2003)) and require every entry of the Q-function to be accurately estimated.…”
Section: Introductionmentioning
confidence: 99%
“…The Q-learning algorithm is perhaps one of the most well-known RL algorithms in the literature [44]. The asymptotic convergence of Q-learning was established in [41,9,10,19,25]. As for finitesample bounds, [4,5,43,12] study the mean-square bounds of synchronous Q-learning.…”
Section: Related Literaturementioning
confidence: 99%
“…For Q-learning with asynchronous update, [4,5] study the mean-square convergence bounds for using constant stepsize, and [16,21,20,29,27] study the high-probability bounds. When Q-learning is used along with function approximation, the asymptotic convergence and finite-sample bounds are studied in [28,13,25,46,17]. Variants of Q-learning algorithms such as double Q-learning, speedy Q-learning, and fitted Q-iteration are also studied in the literature [18,1,45,14].…”
Section: Related Literaturementioning
confidence: 99%
“…For example, in Greedy-GQ (Maei et al, 2010;Wang and Zou, 2020), a control algorithm in the family of the gradient TD methods, the behavior policy is assumed to be fixed. In the convergent analysis of linear Q-learning (Melo et al, 2008;Lee and He, 2019), the behavior policy is assumed to be sufficiently close to the policy that linear Q-learning is expected to converge to.…”
Section: Control: Truncated Emphatic Expected Sarsamentioning
confidence: 99%