2016
DOI: 10.48550/arxiv.1611.05763
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to reinforcement learn

Abstract: In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
232
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 165 publications
(250 citation statements)
references
References 26 publications
(49 reference statements)
1
232
0
Order By: Relevance
“…The works in [11], [12] validate a meta-learning framework that can be used in several learning tasks, e.g., it can be applied to both supervised ML (regression and classification) and RL scenarios. Other works propose metalearning for more specific scenarios, i.e., the update rule and selective copy of weights of deep networks [36], [37], [38] and recurrent networks [39], [40], [41]. In this paper, we design FALCON based on meta-learning paradigm to obtain fast and accurate scheduling policies.…”
Section: Learning Concepts In Networkingmentioning
confidence: 99%
“…The works in [11], [12] validate a meta-learning framework that can be used in several learning tasks, e.g., it can be applied to both supervised ML (regression and classification) and RL scenarios. Other works propose metalearning for more specific scenarios, i.e., the update rule and selective copy of weights of deep networks [36], [37], [38] and recurrent networks [39], [40], [41]. In this paper, we design FALCON based on meta-learning paradigm to obtain fast and accurate scheduling policies.…”
Section: Learning Concepts In Networkingmentioning
confidence: 99%
“…Meta-RL tries to enable an agent to solve unseen tasks and environments efficiently. To this end, meta-RL methods infer the task over a latent context (Lin et al, 2020;Rakelly et al, 2019), redesign policy neural network architecture (Duan et al, 2016;Wang et al, 2016;Lan et al, 2019), control exploration strategies (Gupta et al, 2018), and augment state or reward (Raileanu et al, 2020;Florensa et al, 2018). These previous methods commonly focus on the family of tasks described by a family of MDPs which all share the same dynamics and environment but differ in the task specified by the reward function (Lin et al, 2020).…”
Section: Multiplementioning
confidence: 99%
“…Both the fixed payoff and the mean 𝜇 of the risky arm were drawn from a standard Gaussian distribution at the beginning of an episode, which lasted twenty rounds. To build agents that can trade off exploration versus exploitation, we used memory-based meta-learning [Santoro et al, 2016, Wang et al, 2016, which is known to produce near-optimal bandit players [Mikulik et al, 2020, Ortega et al, 2019.…”
Section: Banditsmentioning
confidence: 99%