Model-based contextual policy search for data-efficient generalization of robot skills

Kupcsik, Andras; Deisenroth, Marc Peter; Peters, Jan; Loh, Ai Poh; Vadakkepat, Prahlad; Neumann, Gerhard

doi:10.1016/j.artint.2014.11.005

Cited by 69 publications

(38 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Functionality of MBRL is evident in simulation for multiple tasks in low data regimes, including quadrupeds [20] and manipulation tasks [21]. Low-level MBRL control (i.e., with direct motor input signals) of an RC car has been demonstrated experimentally, but the system is of lower dimensionality and has static stability [22].…”

Section: Model-based Reinforcement Learningmentioning

confidence: 99%

Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning

Lambert

Drew

Yaconelli

et al. 2019

IEEE Robot. Autom. Lett.

134

View full text Add to dashboard Cite

Designing effective low-level robot controllers often entail platform-specific implementations that require manual heuristic parameter tuning, significant system knowledge, or long design times. With the rising number of robotic and mechatronic systems deployed across areas ranging from industrial automation to intelligent toys, the need for a general approach to generating low-level controllers is increasing. To address the challenge of rapidly generating low-level controllers, we argue for using model-based reinforcement learning (MBRL) trained on relatively small amounts of automatically generated (i.e., without system simulation) data. In this paper, we explore the capabilities of MBRL on a Crazyflie centimeter-scale quadrotor with rapid dynamics to predict and control at ≤ 50Hz. To our knowledge, this is the first use of MBRL for controlled hover of a quadrotor using only on-board sensors, direct motor input signals, and no initial dynamics knowledge. Our controller leverages rapid simulation of a neural network forward dynamics model on a GPU-enabled base station, which then transmits the best current action to the quadrotor firmware via radio. In our experiments, the quadrotor achieved hovering capability of up to 6 seconds with 3 minutes of experimental training data.

show abstract

Section: Model-based Reinforcement Learningmentioning

confidence: 99%

Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning

Lambert

Drew

Yaconelli

et al. 2019

IEEE Robot. Autom. Lett.

134

View full text Add to dashboard Cite

show abstract

“…While they map experience gained in one context to another, they do so for estimating discrete outcome probabilities and not for improving the policy. GP-REPS [19] iteratively learns a transition model of the system using a Gaussian process (GP) [20], which is then used to generate trajectories offline for updating the policy. The authors consider generating additional samples for artificial contexts, but they do not define an explicit factorization.…”

Section: Related Workmentioning

confidence: 99%

Factored Contextual Policy Search with Bayesian optimization

Pinsler

Karkus

Kupcsik

et al. 2019

2019 International Conference on Robotics and Automation (ICRA)

Self Cite

View full text Add to dashboard Cite

Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different task contexts. Contextual policy search offers data-efficient learning and generalization by explicitly conditioning the policy on a parametric context space. In this paper, we further structure the contextual policy representation. We propose to factor contexts into two components: target contexts that describe the task objectives, e.g. target position for throwing a ball; and environment contexts that characterize the environment, e.g. initial position or mass of the ball. Our key observation is that experience can be directly generalized over target contexts. We show that this can be easily exploited in contextual policy search algorithms. In particular, we apply factorization to a Bayesian optimization approach to contextual policy search both in sampling-based and active learning settings. Our simulation results show faster learning and better generalization in various robotic domains. See our supplementary video: https://youtu.be/MNTbBAOufDY.

show abstract

“…The policy update using contextual REPS is formulated as a constraint optimization problem, max π µs(s) π(θ|s)R(θ, s)dθds (12) s.t. ≥ µs(s)KL (π(θ|s)||q(θ|s)) ds, 1 = π(θ|s)ds (13) For details, please refer to the original study and its extensions [15,16]. Contextual REPS models the policy as a Gaussian policy…”

Section: Learning the Policy For The Desired Grasp Typementioning

confidence: 99%

“…Grasping motions for different grasp types are learned as independent policies π k l by the contextual relative entropy policy search (REPS) algorithm [15][16][17]. The learned policy π k l generates the motion parameter θ using the local features of the estimated grasping part s.…”

Section: Learning Multiple Grasping Policiesmentioning

confidence: 99%

Experiments with Hierarchical Reinforcement Learning of Multiple Grasping Policies

Osa

Peters

Neumann

2017

Springer Proceedings in Advanced Robotics

Self Cite

View full text Add to dashboard Cite

Abstract. Robotic grasping has attracted considerable interest, but it still remains a challenging task. The data-driven approach is a promising solution to the robotic grasping problem; this approach leverages a grasp dataset and generalizes grasps for various objects. However, these methods often depend on the quality of the given datasets, which are not trivial to obtain with sufficient quality. Although reinforcement learning approaches have been recently used to achieve autonomous collection of grasp datasets, the existing algorithms are often limited to specific grasp types. In this paper, we present a framework for hierarchical reinforcement learning of grasping policies. In our framework, the lowerlevel hierarchy learns multiple grasp types, and the upper-level hierarchy learns a policy to select from the learned grasp types according to a point cloud of a new object. Through experiments, we validate that our approach learns grasping by constructing the grasp dataset autonomously. The experimental results show that our approach learns multiple grasping policies and generalizes the learned grasps by using local point cloud information.

show abstract

Model-based contextual policy search for data-efficient generalization of robot skills

Cited by 69 publications

References 30 publications

Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning

Low-Level Control of a Quadrotor With Deep Model-Based Reinforcement Learning

Factored Contextual Policy Search with Bayesian optimization

Experiments with Hierarchical Reinforcement Learning of Multiple Grasping Policies

Contact Info

Product

Resources

About