2008
DOI: 10.1007/978-3-540-87700-4_43
|View full text |Cite
|
Sign up to set email alerts
|

Evolution Strategies for Direct Policy Search

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0
2

Year Published

2009
2009
2020
2020

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 38 publications
(31 citation statements)
references
References 14 publications
0
29
0
2
Order By: Relevance
“…Each of these three papers compare performance on only a single simple task with a few settings: mountain car with and without observation noise for fixed and random start states [25], pole balancing with no noise and a random start state [24], and double pole balancing with no noise and a fixed start state [23]. This article differs not only in terms of methods compared, but also because we consider more settings (such as evaluating multiple levels of effector noise), perform tests on the significantly more complex task of keepaway, and form domain-independent conclusions about the two classes of methods considered.…”
Section: Related Workmentioning
confidence: 99%
“…Each of these three papers compare performance on only a single simple task with a few settings: mountain car with and without observation noise for fixed and random start states [25], pole balancing with no noise and a random start state [24], and double pole balancing with no noise and a fixed start state [23]. This article differs not only in terms of methods compared, but also because we consider more settings (such as evaluating multiple levels of effector noise), perform tests on the significantly more complex task of keepaway, and form domain-independent conclusions about the two classes of methods considered.…”
Section: Related Workmentioning
confidence: 99%
“…In this study, we consider the covariance matrix evolution strategy (CMA-ES, [11,8,27]) for direct policy search, which gives striking results on RL benchmark problems [15,7,14,12]. The CMA-ES adapts the policy as well as parameters of its own search strategy (such as a variable metric) based on ranking policies.…”
Section: Introductionmentioning
confidence: 99%
“…Evolution strategies have proven to be powerful methods for reinforcement learning (e.g., see [15,19,24,7,14,12]). It has been argued that they are more robust against noise Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, Heidrich-Meisner and Igel (2008a, 2008b, 2008c performed a systematic comparison between the CMA-ES and policy gradient methods with variable metrics. They discuss similarities and differences between these related approaches.…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…Although ESs can be applied to various kinds of machine learning problems, the most elaborate variants are specialized for real-valued parameter spaces. Exemplary applications include supervised learning of feed-forward and recurrent neural networks, direct policy search in reinforcement learning, and model selection for kernel machines (e.g., Mandischer 2002;Igel et al 2001;Schneider et al 2004;Igel 2003;Friedrichs and Igel 2005;Kassahun and Sommer 2005;Pellecchia et al 2005;Mersch et al 2007;Siebel and Sommer 2007;Heidrich-Meisner and Igel 2008a, 2008b, 2008cGlasmachers and Igel 2008, see below).…”
Section: Introductionmentioning
confidence: 99%