2017
DOI: 10.1016/j.artint.2014.11.005
|View full text |Cite
|
Sign up to set email alerts
|

Model-based contextual policy search for data-efficient generalization of robot skills

Abstract: In robotics, lower-level controllers are typically used to make the robot solve a specific task in a fixed context. For example, the lower-level controller can encode a hitting movement while the context defines the target coordinates to hit. However, in many learning problems the context may change between task executions. To adapt the policy to a new context, we utilize a hierarchical approach by learning an upper-level policy that generalizes the lower-level controllers to new contexts. A common approach to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
38
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 69 publications
(38 citation statements)
references
References 30 publications
0
38
0
Order By: Relevance
“…Functionality of MBRL is evident in simulation for multiple tasks in low data regimes, including quadrupeds [20] and manipulation tasks [21]. Low-level MBRL control (i.e., with direct motor input signals) of an RC car has been demonstrated experimentally, but the system is of lower dimensionality and has static stability [22].…”
Section: Model-based Reinforcement Learningmentioning
confidence: 99%
“…Functionality of MBRL is evident in simulation for multiple tasks in low data regimes, including quadrupeds [20] and manipulation tasks [21]. Low-level MBRL control (i.e., with direct motor input signals) of an RC car has been demonstrated experimentally, but the system is of lower dimensionality and has static stability [22].…”
Section: Model-based Reinforcement Learningmentioning
confidence: 99%
“…While they map experience gained in one context to another, they do so for estimating discrete outcome probabilities and not for improving the policy. GP-REPS [19] iteratively learns a transition model of the system using a Gaussian process (GP) [20], which is then used to generate trajectories offline for updating the policy. The authors consider generating additional samples for artificial contexts, but they do not define an explicit factorization.…”
Section: Related Workmentioning
confidence: 99%
“…The policy update using contextual REPS is formulated as a constraint optimization problem, max π µs(s) π(θ|s)R(θ, s)dθds (12) s.t. ≥ µs(s)KL (π(θ|s)||q(θ|s)) ds, 1 = π(θ|s)ds (13) For details, please refer to the original study and its extensions [15,16]. Contextual REPS models the policy as a Gaussian policy…”
Section: Learning the Policy For The Desired Grasp Typementioning
confidence: 99%
“…Grasping motions for different grasp types are learned as independent policies π k l by the contextual relative entropy policy search (REPS) algorithm [15][16][17]. The learned policy π k l generates the motion parameter θ using the local features of the estimated grasping part s.…”
Section: Learning Multiple Grasping Policiesmentioning
confidence: 99%