2020
DOI: 10.48550/arxiv.2006.15199
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DDPG++: Striving for Simplicity in Continuous-control Off-Policy Reinforcement Learning

Rasool Fakoor,
Pratik Chaudhari,
Alexander J. Smola

Abstract: This paper prescribes a suite of techniques for off-policy Reinforcement Learning (RL) that simplify the training process and reduce the sample complexity. First, we show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled. This is contrast to existing literature which creates sophisticated off-policy techniques. Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step; existing solutions such as … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 13 publications
0
1
0
Order By: Relevance
“…There are several strategies to address this issue (Lan et al, 2020;Fujimoto et al, 2018;Fakoor et al, 2020;Hasselt et al, 2016;Hasselt, 2010), albeit with varying degrees of success. propose a straightforward approach via a convex combination of the extremes of the Q distribution.…”
Section: Overestimation Biasmentioning
confidence: 99%
“…There are several strategies to address this issue (Lan et al, 2020;Fujimoto et al, 2018;Fakoor et al, 2020;Hasselt et al, 2016;Hasselt, 2010), albeit with varying degrees of success. propose a straightforward approach via a convex combination of the extremes of the Q distribution.…”
Section: Overestimation Biasmentioning
confidence: 99%