2020
DOI: 10.1007/978-3-030-61616-8_24
|View full text |Cite
|
Sign up to set email alerts
|

PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 13 publications
(10 citation statements)
references
References 10 publications
0
10
0
Order By: Relevance
“…Recent work has intriguingly found promise in non-reinforcement-learning-based quality diversity (QD) and backplay algorithmic methods, as part of a broader architecture (that can be combined with reinforcement learning) researchers called "Go-Explore" and "Plan, Backplay, Chain Skills" respectively, where agents can revisit useful prior "stepping-stone" environmental states as starting points for further exploration (an ability generally not readily available to physical robotic agents, highlighting the power of the virtual domain) [190,191]. This approach was found to especially improve explorative performance in specific games where reinforcement learning alone fails (e.g., games that require difficult exploration or more global computations) [190,191], and is similar to some foraging behaviors described previously [190]. Collectively, this interesting work suggests that large-scale deep reinforcement learning (which generally only excels during local interactive learning or in controlled environments where complete, and often superhuman [174], global information is provided) is not sufficient for generalized intelligence.…”
Section: Autonomous Game Ai For Quasi-real World Environmentsmentioning
confidence: 99%
“…Recent work has intriguingly found promise in non-reinforcement-learning-based quality diversity (QD) and backplay algorithmic methods, as part of a broader architecture (that can be combined with reinforcement learning) researchers called "Go-Explore" and "Plan, Backplay, Chain Skills" respectively, where agents can revisit useful prior "stepping-stone" environmental states as starting points for further exploration (an ability generally not readily available to physical robotic agents, highlighting the power of the virtual domain) [190,191]. This approach was found to especially improve explorative performance in specific games where reinforcement learning alone fails (e.g., games that require difficult exploration or more global computations) [190,191], and is similar to some foraging behaviors described previously [190]. Collectively, this interesting work suggests that large-scale deep reinforcement learning (which generally only excels during local interactive learning or in controlled environments where complete, and often superhuman [174], global information is provided) is not sufficient for generalized intelligence.…”
Section: Autonomous Game Ai For Quasi-real World Environmentsmentioning
confidence: 99%
“…This divide & conquer type of strategy is a common way to solve a complex RL problem by learning a set of policies on simpler tasks and chaining them to solve the global task [20]. For example, this principle is applied by the Backplay-Chain-Skill part of the Play-Backplay-Chain-Skill (PBCS) algorithm [21]. The Backplay algorithm is used to learn a set of skills backward from the final state of a single demonstration obtained using a planning algorithm.…”
Section: B Skill-chainingmentioning
confidence: 99%
“…The objective of our work is to learn complex manipulation tasks in realistic and obstructed environments. Motion planner augmented RL methods [1][2][3] have shown promising results in solving complex tasks in obstructed environments by combining MP and RL. However, these methods cannot be deployed in real-world settings due to their dependency on state information.…”
Section: Related Workmentioning
confidence: 99%
“…Solving complex manipulation tasks in obstructed environments is a challenging problem in deep reinforcement learning (RL) since it requires precise object interactions as well as collision-free movement across obstacles. To tackle this problem, prior works [1][2][3] have proposed to combine the strengths of motion planning (MP) and RL -safe collision-free maneuvers of MP and sophisticated contact-rich interactions of RL, demonstrating promising results. However, MP requires access to the geometric state of an environment for collision checking, which is often not available in the real world, and is also computationally expensive for a real-time control.…”
Section: Introductionmentioning
confidence: 99%