2021
DOI: 10.48550/arxiv.2106.08414
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…Heavy-Tailed Policy Parametrization: The idea of parametrizing policies via heavy-tailed distribution has appeared in the reinforcement learning literature [23], [37]. The authors in [37] proposed to utilize beta distribution for policy parametrization but are restricted to dense reward structure environments.…”
Section: Learning From Demonstrationmentioning
confidence: 99%
See 2 more Smart Citations
“…Heavy-Tailed Policy Parametrization: The idea of parametrizing policies via heavy-tailed distribution has appeared in the reinforcement learning literature [23], [37]. The authors in [37] proposed to utilize beta distribution for policy parametrization but are restricted to dense reward structure environments.…”
Section: Learning From Demonstrationmentioning
confidence: 99%
“…The authors in [37] proposed to utilize beta distribution for policy parametrization but are restricted to dense reward structure environments. Authors in [23] have focused on the development of heavy-tailed policy gradient to avoid convergence to local maxima and do not explicitly deal with sparse rewards. This work focus on sparse reward continuous control environments and extensive experimental evaluations to support the importance of heavy-tailed policy parametrization.…”
Section: Learning From Demonstrationmentioning
confidence: 99%
See 1 more Smart Citation
“…But the major challenge is how to induce inherent exploration into the training without reshaping the rewards. Recent work in literature [20,21] suggest that one way to handle these is to use heavy-tailed policy (such as Cauchy) parametrization techniques. Motivated by these factors, in this work, we try to find optimal behaviors for outdoor navigation tasks while directly operating under sparse reward settings.…”
Section: Introductionmentioning
confidence: 99%