2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6288330
|View full text |Cite
|
Sign up to set email alerts
|

Tuning-free step-size adaptation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
48
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 42 publications
(49 citation statements)
references
References 7 publications
1
48
0
Order By: Relevance
“…We demonstrate that Scalar Metatrace improves robustness to initial step-size choice in a standard RL domain, while Mixed Metatrace facilitates learning in an RL problem with nonstationary state representation. The latter result extends results of [15] and [8] from the SL case. Reasoning that such non-stationarity in the state representation is an inherent feature of NN function approximation, we also test the method for training a neural network online for several games in the ALE.…”
Section: Resultssupporting
confidence: 90%
See 3 more Smart Citations
“…We demonstrate that Scalar Metatrace improves robustness to initial step-size choice in a standard RL domain, while Mixed Metatrace facilitates learning in an RL problem with nonstationary state representation. The latter result extends results of [15] and [8] from the SL case. Reasoning that such non-stationarity in the state representation is an inherent feature of NN function approximation, we also test the method for training a neural network online for several games in the ALE.…”
Section: Resultssupporting
confidence: 90%
“…This entropy regularized extension to the basic algorithm is used in lines 7, 10, and 14 of algorithm 1. Algorithm 1 also incorporates a normalization technique, analogous to that used in [8], that we will now discuss.…”
Section: Scalar Metatrace For Ac (λ)mentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, Mahmood and Sutton [17], [18] proposed with Autostep an extension to IDBD which has much less dependence on the meta-step-size parameter than IDBD. In the same year, Dabney and Barto [19] developed another adaptive step-size method for temporal difference learning, which is based on the estimation of upper and lower bounds.…”
Section: Related Workmentioning
confidence: 99%