2018
DOI: 10.48550/arxiv.1810.12894
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploration by Random Network Distillation

Abstract: We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
344
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 207 publications
(369 citation statements)
references
References 25 publications
2
344
0
Order By: Relevance
“…However, for learning tasks on graph-level data, no such general-purpose pretrained teacher networks are available; further, graph databases from different domains differ significantly from each other, which also prevents the application of this type of approach to the GAD task. Random knowledge distillation is originally introduced in [5] to address sparse reward problems in deep reinforcement learning (DRL). It uses the random distillation errors to measure the novelty of states as some additional reward signals to encourage DRL agents' exploration in sparse-reward contexts.…”
Section: Knowledge Distillationmentioning
confidence: 99%
See 3 more Smart Citations
“…However, for learning tasks on graph-level data, no such general-purpose pretrained teacher networks are available; further, graph databases from different domains differ significantly from each other, which also prevents the application of this type of approach to the GAD task. Random knowledge distillation is originally introduced in [5] to address sparse reward problems in deep reinforcement learning (DRL). It uses the random distillation errors to measure the novelty of states as some additional reward signals to encourage DRL agents' exploration in sparse-reward contexts.…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…The aim is to calculate the posterior after iteratively updating on the data. According to [5], our task can then be formulated as the optimization problem below:…”
Section: Theoretical Analysis Of Glocalkdmentioning
confidence: 99%
See 2 more Smart Citations
“…Existing methods use curiosity or uncertainty as a signal for exploration [Pathak et al, 2017;Burda et al, 2018] so that the learned agent is able to cover a large state space. However, the exploration-exploitation dilemma, given the sample efficiency consideration, drives us to develop self-imitation learning (SIL) [Oh et al, 2018] methods that focus on exploiting past good experiences for better exploration.…”
Section: Related Workmentioning
confidence: 99%