Factored Contextual Policy Search with Bayesian optimization

Pinsler, Robert; Karkus, Péter; Kupcsik, Andras; Hsu, David; Lee, Wee Sun

doi:10.1109/icra.2019.8793808

Cited by 3 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this context, hindsight refers to the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended. Prior to our work, hindsight has been limited to off-policy reinforcement learning algorithms that rely on experience replay (Andrychowicz et al, 2017) and policy search based on Bayesian optimization (Karkus et al, 2016;Pinsler et al, 2019).…”

Section: Discussionmentioning

confidence: 99%

“…In earlier work, Karkus, Kupcsik, Hsu, and Lee (2016) introduced hindsight to policy search based on Bayesian optimization (Metzen, Fabisch, & Hansen, 2015). This work was recently extended by Pinsler, Karkus, Kupcsik, Hsu, and Lee (2019).…”

Section: Introductionmentioning

confidence: 88%

“…Suppose that the reward r(s, g) is known for every combination of state s and goal g, as in previous work on hindsight (Andrychowicz et al, 2017;Karkus et al, 2016;Pinsler et al, 2019). In that case, it is possible to evaluate a trajectory obtained while trying to achieve an original goal g for an alternative goal g. This information can be exploited using a central result based on importance sampling.…”

Section: Hindsight Policy Gradientsmentioning

confidence: 99%

See 2 more Smart Citations

Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients

et al. 2021

View full text Add to dashboard Cite

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 88%

Section: Hindsight Policy Gradientsmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Subsequently, the robot skills can be honed by updating the MP parameters through trial-and-error within the framework of Policy Search (PS), a branch of Reinforcement Learning (RL) [2] responsible for resolving which trajectories to evaluate in consideration of the rewards of each execution. Thus, PS algorithms have proved successful in several robotic applications [3], including the contextual case, in which robots are required to adapt to changing environments [4], [5], [6], [7].…”

Section: Introductionmentioning

confidence: 99%

Contextual Policy Search for Micro-Data Robot Motion Learning through Covariate Gaussian Process Latent Variable Models

Delgado-Guerrero

Colomé

Torras

2020

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

In the next few years, the amount and variety of context-aware robotic manipulator applications is expected to increase significantly, especially in household environments. In such spaces, thanks to programming by demonstration, nonexpert people will be able to teach robots how to perform specific tasks, for which the adaptation to the environment is imperative, for the sake of effectiveness and users safety. These robot motion learning procedures allow the encoding of such tasks by means of parameterized trajectory generators, usually a Movement Primitive (MP) conditioned on contextual variables. However, naively sampled solutions from these MPs are generally suboptimal/inefficient, according to a given reward function. Hence, Policy Search (PS) algorithms leverage the information of the experienced rewards to improve the robot performance over executions, even for new context configurations. Given the complexity of the aforementioned tasks, PS methods face the challenge of exploring in high-dimensional parameter search spaces. In this work, a solution combining Bayesian Optimization, a data-efficient PS algorithm, with covariate Gaussian Process Latent Variable Models, a recent Dimensionality Reduction technique, is presented. It enables reducing dimensionality and exploiting prior demonstrations to converge in few iterations, while also being compliant with context requirements. Thus, contextual variables are considered in the latent search space, from which a surrogate model for the reward function is built. Then, samples are generated in a low-dimensional latent space, and mapped to a contextdependent trajectory. This allows us to drastically reduce the search space with the covariate GPLVM, e.g. from 105 to 2 parameters, plus a few contextual features. Experimentation in two different scenarios proves the data-efficiency and the power of dimensionality reduction of our approach.

show abstract

MO-BBO: Multi-Objective Bilevel Bayesian Optimization for Robot and Behavior Co-Design

Kim

Pan

Hauser

2021

2021 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

Factored Contextual Policy Search with Bayesian optimization

Cited by 3 publications

References 20 publications

Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients

Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients

Contextual Policy Search for Micro-Data Robot Motion Learning through Covariate Gaussian Process Latent Variable Models

MO-BBO: Multi-Objective Bilevel Bayesian Optimization for Robot and Behavior Co-Design

Contact Info

Product

Resources

About