2008
DOI: 10.1016/j.neunet.2008.02.003
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning of motor skills with policy gradients

Abstract: a b s t r a c tAutonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-err… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

5
630
1
6

Year Published

2011
2011
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 721 publications
(642 citation statements)
references
References 45 publications
5
630
1
6
Order By: Relevance
“…A main third approach aims at direct policy learning, using policy gradient methods [20] or global optimization methods [28]. Direct policy learning most usually assumes a parametric policy space Θ; policy learning aims at finding the optimal θ * parameter in the sense of a policy return function J:…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…A main third approach aims at direct policy learning, using policy gradient methods [20] or global optimization methods [28]. Direct policy learning most usually assumes a parametric policy space Θ; policy learning aims at finding the optimal θ * parameter in the sense of a policy return function J:…”
Section: Related Workmentioning
confidence: 99%
“…The first case is when J is analytically known on a continuous policy space Θ ⊂ IR D . It then comes naturally to use a gradient-based optimization approach, gradually moving the current policy θ t along the gradient ∇J [20] (θ t+1 = θ t + α t ∇ θ J(θ t )). The main issues concern the adjustment of α t , the possible use of the inverse Hessian, and the rate of convergence toward a (local) optimum.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations