2020
DOI: 10.48550/arxiv.2006.01419
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration

Abstract: Policy entropy regularization is commonly used for better exploration in deep reinforcement learning (RL). However, policy entropy regularization is sampleinefficient in off-policy learning since it does not take the distribution of previous samples stored in the replay buffer into account. In order to take advantage of the previous sample distribution from the replay buffer for sample-efficient exploration, we propose sample-aware entropy regularization which maximizes the entropy of weighted sum of the polic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
10
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(11 citation statements)
references
References 29 publications
(54 reference statements)
1
10
0
Order By: Relevance
“…Maximum entropy RL is also related to probabilistic inference [41,46]. Recently, maximizing the entropy of state distribution instead of the policy distribution [27] and maximizing the entropy considering the previous sample action distribution [26] have been investigated for better exploration.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Maximum entropy RL is also related to probabilistic inference [41,46]. Recently, maximizing the entropy of state distribution instead of the policy distribution [27] and maximizing the entropy considering the previous sample action distribution [26] have been investigated for better exploration.…”
Section: Related Workmentioning
confidence: 99%
“…Exploration in RL: Exploration is one of the most important issues in model-free RL, as there is the key assumption that all state-action pairs must be visited infinitely often to guarantee the convergence of Q-function [56]. In order to explore diverse state-action pairs in the joint state-action space, various methods have been considered in prior works: intrinsically-motivated reward based on curiosity [5,11], model prediction error [1,10], information gain [26,28,29], and counting states [33,35]. These exploration techniques improve exploration and performance in challenging sparse-reward environments [3,10,13].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations

Adversarially Guided Actor-Critic

Flet-Berliac,
Ferret,
Pietquin
et al. 2021
Preprint