PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning

Matheron, Guillaume; Perrin, Nicolás; Sigaud, Olivier

doi:10.1007/978-3-030-61616-8_24

Cited by 13 publications

(10 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent work has intriguingly found promise in non-reinforcement-learning-based quality diversity (QD) and backplay algorithmic methods, as part of a broader architecture (that can be combined with reinforcement learning) researchers called "Go-Explore" and "Plan, Backplay, Chain Skills" respectively, where agents can revisit useful prior "stepping-stone" environmental states as starting points for further exploration (an ability generally not readily available to physical robotic agents, highlighting the power of the virtual domain) [190,191]. This approach was found to especially improve explorative performance in specific games where reinforcement learning alone fails (e.g., games that require difficult exploration or more global computations) [190,191], and is similar to some foraging behaviors described previously [190]. Collectively, this interesting work suggests that large-scale deep reinforcement learning (which generally only excels during local interactive learning or in controlled environments where complete, and often superhuman [174], global information is provided) is not sufficient for generalized intelligence.…”

Section: Autonomous Game Ai For Quasi-real World Environmentsmentioning

confidence: 99%

Multiscale Computation and Dynamic Attention in Biological and Artificial Intelligence

2020

View full text Add to dashboard Cite

Biological and artificial intelligence (AI) are often defined by their capacity to achieve a hierarchy of short-term and long-term goals that require incorporating information over time and space at both local and global scales. More advanced forms of this capacity involve the adaptive modulation of integration across scales, which resolve computational inefficiency and explore-exploit dilemmas at the same time. Research in neuroscience and AI have both made progress towards understanding architectures that achieve this. Insight into biological computations come from phenomena such as decision inertia, habit formation, information search, risky choices and foraging. Across these domains, the brain is equipped with mechanisms (such as the dorsal anterior cingulate and dorsolateral prefrontal cortex) that can represent and modulate across scales, both with top-down control processes and by local to global consolidation as information progresses from sensory to prefrontal areas. Paralleling these biological architectures, progress in AI is marked by innovations in dynamic multiscale modulation, moving from recurrent and convolutional neural networks—with fixed scalings—to attention, transformers, dynamic convolutions, and consciousness priors—which modulate scale to input and increase scale breadth. The use and development of these multiscale innovations in robotic agents, game AI, and natural language processing (NLP) are pushing the boundaries of AI achievements. By juxtaposing biological and artificial intelligence, the present work underscores the critical importance of multiscale processing to general intelligence, as well as highlighting innovations and differences between the future of biological and artificial intelligence.

show abstract

Section: Autonomous Game Ai For Quasi-real World Environmentsmentioning

confidence: 99%

Multiscale Computation and Dynamic Attention in Biological and Artificial Intelligence

2020

View full text Add to dashboard Cite

show abstract

“…This divide & conquer type of strategy is a common way to solve a complex RL problem by learning a set of policies on simpler tasks and chaining them to solve the global task [20]. For example, this principle is applied by the Backplay-Chain-Skill part of the Play-Backplay-Chain-Skill (PBCS) algorithm [21]. The Backplay algorithm is used to learn a set of skills backward from the final state of a single demonstration obtained using a planning algorithm.…”

Section: B Skill-chainingmentioning

confidence: 99%

Divide & Conquer Imitation Learning

Chenu¹,

Perrin-Gilbert²,

Sigaud³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

When cast into the Deep Reinforcement Learning framework, many robotics tasks require solving a long horizon and sparse reward problem, where learning algorithms struggle. In such context, Imitation Learning (IL) can be a powerful approach to bootstrap the learning process. However, most IL methods require several expert demonstrations which can be prohibitively difficult to acquire. Only a handful of IL algorithms have shown efficiency in the context of an extreme low expert data regime where a single expert demonstration is available. In this paper, we present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory. Based on a sequential inductive bias, our method divides the complex task into smaller skills. The skills are learned into a goal-conditioned policy that is able to solve each skill individually and chain skills to solve the entire task. We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.

show abstract

“…The objective of our work is to learn complex manipulation tasks in realistic and obstructed environments. Motion planner augmented RL methods [1][2][3] have shown promising results in solving complex tasks in obstructed environments by combining MP and RL. However, these methods cannot be deployed in real-world settings due to their dependency on state information.…”

Section: Related Workmentioning

confidence: 99%

“…Solving complex manipulation tasks in obstructed environments is a challenging problem in deep reinforcement learning (RL) since it requires precise object interactions as well as collision-free movement across obstacles. To tackle this problem, prior works [1][2][3] have proposed to combine the strengths of motion planning (MP) and RL -safe collision-free maneuvers of MP and sophisticated contact-rich interactions of RL, demonstrating promising results. However, MP requires access to the geometric state of an environment for collision checking, which is often not available in the real world, and is also computationally expensive for a real-time control.…”

Section: Introductionmentioning

confidence: 99%

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

Liu¹,

Uppal²,

Sukhatme³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning complex manipulation tasks in realistic, obstructed environments is a challenging problem due to hard exploration in the presence of obstacles and high-dimensional visual observations. Prior work tackles the exploration problem by integrating motion planning and reinforcement learning. However, the motion planner augmented policy requires access to state information, which is often not available in the real-world settings. To this end, we propose to distill a state-based motion planner augmented policy to a visual control policy via (1) visual behavioral cloning to remove the motion planner dependency along with its jittery motion, and (2) vision-based reinforcement learning with the guidance of the smoothed trajectories from the behavioral cloning agent. We evaluate our method on three manipulation tasks in obstructed environments and compare it against various reinforcement learning and imitation learning baselines. The results demonstrate that our framework is highly sample-efficient and outperforms the state-of-the-art algorithms. Moreover, coupled with domain randomization, our policy is capable of zero-shot transfer to unseen environment settings with distractors. Code and videos are available at https://clvrai.com/mopa-pd.

show abstract

PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning

Cited by 13 publications

References 10 publications

Multiscale Computation and Dynamic Attention in Biological and Artificial Intelligence

Multiscale Computation and Dynamic Attention in Biological and Artificial Intelligence

Divide & Conquer Imitation Learning

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

Contact Info

Product

Resources

About