Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/581
|View full text |Cite
|
Sign up to set email alerts
|

Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Abstract: Reinforcement learning (RL) has had many successes in both "deep" and "shallow" settings. In both cases, significant hyperparameter tuning is often required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this -most notably large experience replay buffers or the use of multiple parallel actors. These techniques come at the cost of moving away from the o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 13 publications
0
8
0
Order By: Relevance
“…Several previously discussed meta-representations have been explored in RL including learning the initial conditions [19], [169], hyperparameters [169], [173], step directions [76] and step sizes [171]. These enable gradient-based learning of a neural policy with fewer environmental interactions and training fast convolutional [38] or recurrent [23], [112] blackbox models to synthesize a policy by embedding the enviroment experience.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Several previously discussed meta-representations have been explored in RL including learning the initial conditions [19], [169], hyperparameters [169], [173], step directions [76] and step sizes [171]. These enable gradient-based learning of a neural policy with fewer environmental interactions and training fast convolutional [38] or recurrent [23], [112] blackbox models to synthesize a policy by embedding the enviroment experience.…”
Section: Methodsmentioning
confidence: 99%
“…[91] [16], [76], [103], [104], [170] PSD [78]. [90] Hyperparam HyperRep [20], HyperOpt [66], LHML [68] MetaTrace [171]. [172] [169] [173] Feed-Forward model SNAIL [38], CNAP [107].…”
Section: Gradient Rl Evolutionmentioning
confidence: 99%
See 1 more Smart Citation
“…In reinforcement leanring, Kearney et al (2018) extended IDBD to TD methods for stationary and non-stationary prediction tasks, which showed better results than TD methods with the constant step-size and scalar step-size adaptation. Metatrace (Young et al, 2018) adapted the step-size of the actor-critic algorithm with eligibility trace for both cases of the scalar step-size and the component step-sizes, which can accelerate the learning process of the actor-critic algorithm.…”
Section: Idbdmentioning
confidence: 99%
“…Meta-gradients have been previously used to learn intrinsic rewards for policy gradient (Zheng et al, 2018) and auxiliary tasks (Veeriah et al, 2019). Meta-gradients have also been used to adapt optimizer parameters (Young et al, 2018;Franceschi et al, 2017). In our setup, we consider the continuous control setting, provide the first implementation of metagradients for an algorithm that uses an experience replay, and focus on adapting meta-parameters that encourage soft constraint satisfaction while maximizing expected return.…”
Section: Introductionmentioning
confidence: 99%