2021
DOI: 10.48550/arxiv.2101.08862
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Breaking the Deadly Triad with a Target Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 27 publications
0
9
0
Order By: Relevance
“…Learning rate for Greedy GQ (GGQ) and Coupled Q Learning (CQL) are set as 0.05 and 0.25, respectively as in Carvalho et al, 2020;Maei et al, 2010. Since CQL requires normalized feature values, we scaled the feature value with 1 2 as in Carvalho et al, 2020, and initialized weights as one. We implemented Q-learning with target network (Zhang et al, 2021) without projection for practical reason (Qtarget). We set the learning rate as 0.25 and 0.05 respectively, and the weight η as two.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Learning rate for Greedy GQ (GGQ) and Coupled Q Learning (CQL) are set as 0.05 and 0.25, respectively as in Carvalho et al, 2020;Maei et al, 2010. Since CQL requires normalized feature values, we scaled the feature value with 1 2 as in Carvalho et al, 2020, and initialized weights as one. We implemented Q-learning with target network (Zhang et al, 2021) without projection for practical reason (Qtarget). We set the learning rate as 0.25 and 0.05 respectively, and the weight η as two.…”
Section: Methodsmentioning
confidence: 99%
“…Carvalho et al, 2020;, 2021 assume ||x(s, a)|| ∞ ≤ 1 for all (s, a) ∈ S × A. Moreover, Zhang et al 2021 requires specific bounds on the feature matrix which is dependent on various factors e.g. projection radius and transition matrix .…”
Section: Q-learning With Linear Function Approximationmentioning
confidence: 99%
See 2 more Smart Citations
“…Despite their resounding empirical success in deep RL, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. Theoretical contributions investigating the use of a target network are very recent and limited to temporal difference (TD) learning for policy evaluation [23] and critic-only methods such as Q-learning for control [48]. In particular, these works are not concerned with actor-critic algorithms and leave the question of the finite-time analysis open.…”
Section: Introductionmentioning
confidence: 99%