Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Young, Kenny; Wang, Baoxiang; Taylor, Matthew E.

doi:10.24963/ijcai.2019/581

Cited by 7 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several previously discussed meta-representations have been explored in RL including learning the initial conditions [19], [169], hyperparameters [169], [173], step directions [76] and step sizes [171]. These enable gradient-based learning of a neural policy with fewer environmental interactions and training fast convolutional [38] or recurrent [23], [112] blackbox models to synthesize a policy by embedding the enviroment experience.…”

Section: Methodsmentioning

confidence: 99%

“…[91] [16], [76], [103], [104], [170] PSD [78]. [90] Hyperparam HyperRep [20], HyperOpt [66], LHML [68] MetaTrace [171]. [172] [169] [173] Feed-Forward model SNAIL [38], CNAP [107].…”

Section: Gradient Rl Evolutionmentioning

confidence: 99%

“…Online meta-RL A significant fraction of meta-RL studies addressed the single-task setting, where the metaknowledge such as loss [117], [179], reward [173], [180], hyperparameters [171], [172], or exploration strategy [181] are trained online together with the base policy while learning a single task. These methods thus do not require task families and provide a direct improvement to their respective base learners' performance.…”

Section: Optimizationmentioning

confidence: 99%

See 2 more Smart Citations

Meta-Learning in Neural Networks: A Survey

Hospedales¹,

Antoniou²,

Micaelli³

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

775

371

View full text Add to dashboard Cite

The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent years. Contrary to conventional approaches to AI where tasks are solved from scratch using a fixed learning algorithm, meta-learning aims to improve the learning algorithm itself, given the experience of multiple learning episodes. This paradigm provides an opportunity to tackle many conventional challenges of deep learning, including data and computation bottlenecks, as well as generalization. This survey describes the contemporary meta-learning landscape. We first discuss definitions of meta-learning and position it with respect to related fields, such as transfer learning and hyperparameter optimization. We then propose a new taxonomy that provides a more comprehensive breakdown of the space of meta-learning methods today. We survey promising applications and successes of meta-learning such as few-shot learning and reinforcement learning. Finally, we discuss outstanding challenges and promising areas for future research.

show abstract

Section: Methodsmentioning

confidence: 99%

“…[91] [16], [76], [103], [104], [170] PSD [78]. [90] Hyperparam HyperRep [20], HyperOpt [66], LHML [68] MetaTrace [171]. [172] [169] [173] Feed-Forward model SNAIL [38], CNAP [107].…”

Section: Gradient Rl Evolutionmentioning

confidence: 99%

Section: Optimizationmentioning

confidence: 99%

See 1 more Smart Citation

Meta-Learning in Neural Networks: A Survey

Hospedales¹,

Antoniou²,

Micaelli³

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

775

371

View full text Add to dashboard Cite

show abstract

“…In reinforcement leanring, Kearney et al (2018) extended IDBD to TD methods for stationary and non-stationary prediction tasks, which showed better results than TD methods with the constant step-size and scalar step-size adaptation. Metatrace (Young et al, 2018) adapted the step-size of the actor-critic algorithm with eligibility trace for both cases of the scalar step-size and the component step-sizes, which can accelerate the learning process of the actor-critic algorithm.…”

Section: Idbdmentioning

confidence: 99%

Learning to Accelerate by the Methods of Step-size Planning

Yao¹

2022

Preprint

View full text Add to dashboard Cite

Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call stepsize planning, use the update experience to learn an improved way of updating the parameters. The methods organize the experience into K steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix stepsize for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, 1 − µ/L, where µ, L are the strongly convex factor of the loss function F and the Lipschitz constant of F , which is the theoretical limit for the convergence rate of first-order methods. On the well-known non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a 10 −3 accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures.

show abstract

“…Meta-gradients have been previously used to learn intrinsic rewards for policy gradient (Zheng et al, 2018) and auxiliary tasks (Veeriah et al, 2019). Meta-gradients have also been used to adapt optimizer parameters (Young et al, 2018;Franceschi et al, 2017). In our setup, we consider the continuous control setting, provide the first implementation of metagradients for an algorithm that uses an experience replay, and focus on adapting meta-parameters that encourage soft constraint satisfaction while maximizing expected return.…”

Section: Introductionmentioning

confidence: 99%

Balancing Constraints and Rewards with Meta-Gradient D4PG

Calian,

Mankowitz,

Zahavy

et al. 2020

Preprint

View full text Add to dashboard Cite

Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved without violating the constraints. However, in many real-world cases, constraint violations are undesirable yet they are not catastrophic, motivating the need for soft-constrained RL approaches. We present two soft-constrained RL approaches that utilize meta-gradients to find a good trade-off between expected return and minimizing constraint violations. We demonstrate the effectiveness of these approaches by showing that they consistently outperform the baselines across four different Mujoco domains. * indicates equal contribution.

show abstract

Metatrace Actor-Critic: Online Step-Size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Cited by 7 publications

References 13 publications

Meta-Learning in Neural Networks: A Survey

Meta-Learning in Neural Networks: A Survey

Learning to Accelerate by the Methods of Step-size Planning

Balancing Constraints and Rewards with Meta-Gradient D4PG

Contact Info

Product

Resources

About