Biped dynamic walking using reinforcement learning

Benbrahim, H.; Miller, W. Thomas

doi:10.1016/s0921-8890(97)00043-2

Cited by 166 publications

(110 citation statements)

References 33 publications

Supporting

Mentioning

109

Contrasting

Unclassified

Order By: Relevance

“…Furthermore, in practice, already a single roll-out can suffice for an unbiased gradient estimate Spall, 2003) viable for a good policy update step, thus reducing the number of roll-outs needed. Finally, this approach has yielded the most realworld robot motor learning results (Benbrahim & Franklin, 1997;Endo et al, 2005;Gullapalli et al, 1994;Kimura & Kobayashi, 1997;Mori et al, 2004;Nakamura et al, 2004;Peters et al, 2005a). In the subsequent two sections, we will strive to explain and improve this type of gradient estimator.…”

Section: Likelihood Ratio Methods and Reinforcementioning

confidence: 99%

“…Policy gradient methods are a notable exception to this statement. Starting with the pioneering work 1 of Gullapali and colleagues (Benbrahim & Franklin, 1997;Gullapalli, Franklin, & Benbrahim, 1994) in the early 1990s, these methods have been applied to a variety of robot learning problems ranging from simple control tasks (e.g., balancing a ball on a beam (Benbrahim, Doleac, Franklin, & Selfridge, 1992), and pole balancing (Kimura & Kobayashi, 1998)) to complex learning tasks involving many degrees of freedom such as learning of complex motor skills (Gullapalli et al, 1994;Mitsunaga, Smith, Kanda, Ishiguro, & Hagita, 2005;Miyamoto et al, 1995Miyamoto et al, , 1996Peters & Schaal, 2006;Peters, Vijayakumar, & Schaal, 2005a) and locomotion (Endo, Morimoto, Matsubara, Nakanishi, & Cheng, 2005;Kimura & Kobayashi, 1997;Kohl & Stone, 2004;Mori, Nakamura, aki Sato, & Ishii, 2004;Sato, Nakamura, & Ishii, 2002;Tedrake, Zhang, & Seung, 2005).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement learning of motor skills with policy gradients

2008

View full text Add to dashboard Cite

a b s t r a c tAutonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.

show abstract

Section: Likelihood Ratio Methods and Reinforcementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Reinforcement learning of motor skills with policy gradients

2008

View full text Add to dashboard Cite

show abstract

“…Furthermore, in practice, already a single roll-out can suffice for an unbiased gradient estimate [21], [25] viable for a good policy update step, thus reducing the amount of roll-outs needed. Finally, this approach has yielded the most real-world robotics results [1], …”

Section: A General Approaches To Policy Gradient Estimationmentioning

confidence: 99%

“…Policy gradient methods are a notable exception to this statement. Starting with the pioneering work of Gullapali, Franklin and Benbrahim [1], [2] in the early 1990s, these methods have been applied to a variety of robot learning problems ranging from simple control tasks (e.g., balancing a ball-on a beam [3], and pole-balancing [4]) to complex learning tasks involving many degrees of freedom such as learning of complex motor skills [2], [5], [6] and locomotion [7]- [14] 1 . The advantages of policy gradient methods for robotics are numerous.…”

Section: Introductionmentioning

confidence: 99%

“…The advantages of policy gradient methods for robotics are numerous. Among the most important ones are that the policy representations can be chosen so that it is meaningful for the task and can incorporate previous domain knowledge, that often fewer parameters are needed in the learning process than in value-function based approaches and that there is a variety of different algorithms for policy gradient estimation in the 1 Recent advances in robot learning [15] clearly indicate that policy gradient methods are currently the most feasible choice for robotics and that few other algorithms appear applicable. Some methods used in robotics estimate and apply policy gradients but are not well-known as such, e.g., differential dynamic programming usually estimates a model-based gradient and the PEGASUS [16] trick is often used to estimate finite difference gradients efficiently.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Policy Gradient Methods for Robotics

Peters

Schaal

2006

2006 IEEE/RSJ International Conference on Intelligent Robots and Systems

407

402

View full text Add to dashboard Cite

Abstract-The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm.

show abstract

Control

2009

Bipedal Robots

View full text Add to dashboard Cite

Biped dynamic walking using reinforcement learning

Cited by 166 publications

References 33 publications

Reinforcement learning of motor skills with policy gradients

Reinforcement learning of motor skills with policy gradients

Policy Gradient Methods for Robotics

Control

Contact Info

Product

Resources

About