2013
DOI: 10.1016/j.robot.2012.09.012
|View full text |Cite
|
Sign up to set email alerts
|

Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning

Abstract: The democratization of robotics technology and the development of new actuators progressively bring robots closer to humans. The applications that can now be envisaged drastically contrast with the requirements of industrial robots. In standard manufacturing settings, the criterions used to assess performance are usually related to the robot's accuracy, repeatability, speed or stiffness. Learning a control policy to actuate such robots is characterized by the search of a single solution for the task, with a re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 50 publications
(36 citation statements)
references
References 44 publications
0
36
0
Order By: Relevance
“…It is applied to learning the "ball in a cup" game, using a DMP as a parameterized policy. A similar algorithm was used in [5] in a pancake flipping task using a different parameterized policy consisting of a time-dependent mixture of proportional-derivative systems. PI 2 is a policy search method derived from the first principles of stochastic optimal control [14].…”
Section: Policy Search Methods For Learning Motions From Experiencementioning
confidence: 99%
See 2 more Smart Citations
“…It is applied to learning the "ball in a cup" game, using a DMP as a parameterized policy. A similar algorithm was used in [5] in a pancake flipping task using a different parameterized policy consisting of a time-dependent mixture of proportional-derivative systems. PI 2 is a policy search method derived from the first principles of stochastic optimal control [14].…”
Section: Policy Search Methods For Learning Motions From Experiencementioning
confidence: 99%
“…They have convenient properties, such as the possibility to allow non-constrained learning while ensuring qualitative behavior such as global stability at an attractor point or convergence to a limit cycle. There have been numerous successful demonstrations of learning from demonstration and reinforcement learning in such motion representations [18,19,5] learning motions for a large variety of tasks.…”
Section: Representing Motions For Control In Roboticsmentioning
confidence: 99%
See 1 more Smart Citation
“…This number is initially chosen to be larger than the number of possible optima, and redundant options are deleted as the algorithm runs, with care taken to avoid premature deletion. Calinon et al [15] uses a Gaussian Mixture Model to estimate the structure of the cost function J, mapping the cost as a probability density function that depends on the policy parameters. Exploration noise is decayed manually, rather than automating it with covariance matrix adaptation.…”
Section: Problem Statementmentioning
confidence: 99%
“…Exploration noise is decayed manually, rather than automating it with covariance matrix adaptation. The key difference between the algorithms in [7], [15] and SODIRS is that the former consider multiple skill options that are able to solve the same task, whereas SODIRS learns multiple options for multiple task variations, without requiring the user to specify the task parameter space.…”
Section: Problem Statementmentioning
confidence: 99%