Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

Parisotto, Emilio; Ba, Jimmy; Salakhutdinov, Ruslan

doi:10.48550/arxiv.1511.06342

Cited by 96 publications

(139 citation statements)

References 4 publications

Supporting

Mentioning

132

Contrasting

Order By: Relevance

“…Learning with multiple objectives are shown to be beneficial in DRL tasks [Wilson et al, 2007, Pinto and Gupta, 2017, Hausman et al, 2018. Sharing parameters across tasks [Parisotto et al, 2015, Rusu et al, 2015, Teh et al, 2017 usually results in conflicting gradients from different tasks. One way to mitigate this is to explicitly model the similarity between gradients obtained from different tasks [Yu et al, 2020, Zhang and Yeung, 2014, Kendall et al, 2018, Lin et al, 2019, Sener and Koltun, 2018, Du et al, 2018.…”

Section: Related Workmentioning

confidence: 99%

Learning Multi-Objective Curricula for Deep Reinforcement Learning

Kang¹,

Liu²,

Gupta³

et al. 2021

Preprint

View full text Add to dashboard Cite

Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward shaping, environment generation, or initial state generation. However, prior work only considers curriculum learning following one of the aforementioned predefined paradigms. It is unclear which of these paradigms are complementary, and how the combination of them can be learned from interactions with the environment. Therefore, in this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula that are generated by a set of parametric curriculum modules. Each curriculum module is instantiated as a neural network and is responsible for generating a particular curriculum. In order to coordinate those potentially conflicting modules in unified parameter space, we propose a multi-task hyper-net learning framework that uses a single hyper-net to parameterize all those curriculum modules. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum, which may otherwise be difficult to design manually. We evaluate our method on a series of robotic manipulation tasks and demonstrate its superiority over other state-of-the-art ACL methods in terms of sample efficiency and final performance.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Multi-Objective Curricula for Deep Reinforcement Learning

Kang¹,

Liu²,

Gupta³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…MTL eases deep learning's need for huge amounts of training data and to start learning each new task from scratch. Training shared parameters on multiple tasks allows for supervision of one task to aid in the learning for another, and a set of trained shared features can often be reused to instantiate learning on a new task, yielding faster learning through feature reuse [28]. However, achieving increased data efficiency, transfer robustness, and regularization from MTL is never guaranteed and highly dependent on the relationships between the tasks involved [36,2].…”

Section: Related Workmentioning

confidence: 99%

SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning

Crawshaw,

Košecká

2021

Preprint

View full text Add to dashboard Cite

Multi-task learning (MTL) is a subfield of machine learning with important applications, but the multiobjective nature of optimization in MTL leads to difficulties in balancing training between tasks. The best MTL optimization methods require individually computing the gradient of each task's loss function, which impedes scalability to a large number of tasks. In this paper, we propose Scaled Loss Approximate Weighting (SLAW), a method for multitask optimization that matches the performance of the best existing methods while being much more efficient. SLAW balances learning between tasks by estimating the magnitudes of each task's gradient without performing any extra backward passes. We provide theoretical and empirical justification for SLAW's estimation of gradient magnitudes. Experimental results on non-linear regression, multi-task computer vision, and virtual screening for drug discovery demonstrate that SLAW is significantly more efficient than strong baselines without sacrificing performance and applicable to a diverse range of domains.

show abstract

“…In fact, it is worth noting that several works describing the benefits of TL in RL do exist (but they all differ from the study presented in this work): Tirinzoni et al (2018) show that it possible to successfully transfer value functions across tasks, yet their work does not consider deep networks as function approximators but rather Gaussian mixtures. Parisotto et al (2015) show that it can be beneficial to fine-tune a pre-trained DRL agent, but they consider multi-task learning and policy gradient algorithms as a way of pre-training. Rusu et al (2016) also show that fine-tuning can be beneficial, but in the context of progressive networks and again of policy gradient techniques.…”

Section: Related Work and Conclusionmentioning

confidence: 99%

On The Transferability of Deep-Q Networks

Sabatelli¹,

Geurts²

2021

Preprint

View full text Add to dashboard Cite

Transfer Learning (TL) is an efficient machine learning paradigm that allows overcoming some of the hurdles that characterize the successful training of deep neural networks, ranging from long training times to the needs of large datasets. While exploiting TL is a well established and successful training practice in Supervised Learning (SL), its applicability in Deep Reinforcement Learning (DRL) is rarer.In this paper, we study the level of transferability of three different variants of Deep-Q Networks on popular DRL benchmarks as well as on a set of novel, carefully designed control tasks. Our results show that transferring neural networks in a DRL context can be particularly challenging and is a process which in most cases results in negative transfer. In the attempt of understanding why Deep-Q Networks transfer so poorly, we gain novel insights into the training dynamics that characterizes this family of algorithms.

show abstract

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

Cited by 96 publications

References 4 publications

Learning Multi-Objective Curricula for Deep Reinforcement Learning

Learning Multi-Objective Curricula for Deep Reinforcement Learning

SLAW: Scaled Loss Approximate Weighting for Efficient Multi-Task Learning

On The Transferability of Deep-Q Networks

Contact Info

Product

Resources

About