1997
DOI: 10.1016/s0921-8890(97)00043-2
|View full text |Cite
|
Sign up to set email alerts
|

Biped dynamic walking using reinforcement learning

Abstract: This paper presents some results from a study of biped dynamic walking using reinforcement learning. During this study a hardware biped robot was built, a new reinforcement learning algorithm as well as a new learning architecture were developed. The biped learned dynamic walking without any previous knowledge about its dynamic model. The Self Scaling Reinforcement learning algorithm was developed in order to deal with the problem of reinforcement learning in continuous action domains. The learning architectur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
109
0
1

Year Published

2001
2001
2009
2009

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 166 publications
(110 citation statements)
references
References 33 publications
0
109
0
1
Order By: Relevance
“…Furthermore, in practice, already a single roll-out can suffice for an unbiased gradient estimate Spall, 2003) viable for a good policy update step, thus reducing the number of roll-outs needed. Finally, this approach has yielded the most realworld robot motor learning results (Benbrahim & Franklin, 1997;Endo et al, 2005;Gullapalli et al, 1994;Kimura & Kobayashi, 1997;Mori et al, 2004;Nakamura et al, 2004;Peters et al, 2005a). In the subsequent two sections, we will strive to explain and improve this type of gradient estimator.…”
Section: Likelihood Ratio Methods and Reinforcementioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, in practice, already a single roll-out can suffice for an unbiased gradient estimate Spall, 2003) viable for a good policy update step, thus reducing the number of roll-outs needed. Finally, this approach has yielded the most realworld robot motor learning results (Benbrahim & Franklin, 1997;Endo et al, 2005;Gullapalli et al, 1994;Kimura & Kobayashi, 1997;Mori et al, 2004;Nakamura et al, 2004;Peters et al, 2005a). In the subsequent two sections, we will strive to explain and improve this type of gradient estimator.…”
Section: Likelihood Ratio Methods and Reinforcementioning
confidence: 99%
“…Policy gradient methods are a notable exception to this statement. Starting with the pioneering work 1 of Gullapali and colleagues (Benbrahim & Franklin, 1997;Gullapalli, Franklin, & Benbrahim, 1994) in the early 1990s, these methods have been applied to a variety of robot learning problems ranging from simple control tasks (e.g., balancing a ball on a beam (Benbrahim, Doleac, Franklin, & Selfridge, 1992), and pole balancing (Kimura & Kobayashi, 1998)) to complex learning tasks involving many degrees of freedom such as learning of complex motor skills (Gullapalli et al, 1994;Mitsunaga, Smith, Kanda, Ishiguro, & Hagita, 2005;Miyamoto et al, 1995Miyamoto et al, , 1996Peters & Schaal, 2006;Peters, Vijayakumar, & Schaal, 2005a) and locomotion (Endo, Morimoto, Matsubara, Nakanishi, & Cheng, 2005;Kimura & Kobayashi, 1997;Kohl & Stone, 2004;Mori, Nakamura, aki Sato, & Ishii, 2004;Sato, Nakamura, & Ishii, 2002;Tedrake, Zhang, & Seung, 2005).…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, in practice, already a single roll-out can suffice for an unbiased gradient estimate [21], [25] viable for a good policy update step, thus reducing the amount of roll-outs needed. Finally, this approach has yielded the most real-world robotics results [1], …”
Section: A General Approaches To Policy Gradient Estimationmentioning
confidence: 99%
“…Policy gradient methods are a notable exception to this statement. Starting with the pioneering work of Gullapali, Franklin and Benbrahim [1], [2] in the early 1990s, these methods have been applied to a variety of robot learning problems ranging from simple control tasks (e.g., balancing a ball-on a beam [3], and pole-balancing [4]) to complex learning tasks involving many degrees of freedom such as learning of complex motor skills [2], [5], [6] and locomotion [7]- [14] 1 . The advantages of policy gradient methods for robotics are numerous.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation