Combining Policy Search with Planning in Multi-agent Cooperation

Ma, Jie; Cameron, Stephen

doi:10.1007/978-3-642-02921-9_46

Cited by 7 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The methods based on Dyna and prioritized sweeping have not been demonstrated to address sparse rewards, which are required to map discrete high-level actions and states to low-level subgoal states in a scalable manner. Ma and Cameron (2009) present the policy search planning method, in which they extend the policy search GPOMDP (Baxter and Bartlett, 2001) towards the multi-agent domain of robotic soccer. Herein, they map symbolic plans to policies using an expert knowledge database.…”

Section: Integrating Learning and Planningmentioning

confidence: 99%

“…Existing approaches that integrate action planning with reinforcement learning have not been able to map subgoals to low-level motion trajectories for realistic continuous-space robotic applications (Grounds and Kudenko, 2005 ; Ma and Cameron, 2009 ) because they rely on a continuous dense reward signal that is proportional to manually defined metrics that estimate how well a problem has been solved (Ng et al, 1999 ). The manual definition of such metrics, also known as reward shaping , is a non-trivial problem itself because the semantic distance to a continuous goal is often not proportional to the metric distance.…”

Section: Introductionmentioning

confidence: 99%

“…We address P.2 by integrating this formalism with goal-independent reinforcement learning based on sparse rewards in Sec. 3.2.Existing approaches that integrate action planning with reinforcement learning have not been able to map subgoals to low-level motion trajectories for realistic continuous-space robotic applications (Grounds and Kudenko, 2005;Ma and Cameron, 2009) because they rely on a continuous dense reward signal that is proportional to a manually defined metrics that estimates how well a problem has been solved (Ng et al, 1999). The manual definition of such metrics, also known as reward shaping, is a non-trivial problem itself because the semantic distance to a continuous goal is often not proportional to the metric distance.Recently, so-called universal value function approximators (UVFAs) (Schaul et al, 2015) in combination with hindsight experience replay (HER) (Andrychowicz et al, 2017) and neural actor-critic reinforcement learning methods (Lillicrap et al, 2016) have been proposed to alleviate this issue.…”

mentioning

confidence: 99%

See 2 more Smart Citations

From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving

2019

View full text Add to dashboard Cite

Reinforcement learning is an appropriate and successful method to robustly perform lowlevel robot control under noisy conditions. Symbolic action planning is useful to resolve causal dependencies and to break a causally complex problem down into a sequence of simpler high-level actions. A problem with the integration of both approaches is that action planning is based on discrete high-level action-and state spaces, whereas reinforcement learning is usually driven by a continuous reward function. However, recent advances in reinforcement learning, specifically, universal value function approximators and hindsight experience replay, have focused on goal-independent methods based on sparse rewards. In this article, we build on these novel methods to facilitate the integration of action planning with reinforcement learning by exploiting the reward-sparsity as a bridge between the high-level and low-level state-and control spaces. As a result, we demonstrate that the integrated neuro-symbolic method is able to solve object manipulation problems that involve tool use and non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge. Eppe et al. From semantics to executioncausal puzzles from scratch, without learning from demonstration or other data sources, remains unsolved. The existing learning-based approaches are either very constrained (e.g. (Deisenroth and Rasmussen, 2011)), or they have been applied only to low-dimensional non-noisy control problems that do not involve complex causal dependencies (e.g. (Bacon et al., 2017;Levy et al., 2019)), or they build on learning from demonstration (Aytar et al., 2018).A complementary method to address complex causal dependencies is to use pre-specified semantic domain knowledge, e.g. in the form of an action planning domain description (McDermott et al., 1998).With an appropriate problem description, a planner can provide a sequence of solvable sub-tasks in a discrete high-level action space. However, the problem with this semantic and symbolic task planning approach is that the high-level actions generated by the planner require the grounding in a low-level motion execution layer to consider the context of the current low-level state. For example, executing a high-level robotic action move object to target requires precise information (e.g. the location and the shape) of the object to move it to the target location along a context-specific path. This grounding problem consists of two sub-problems P.1 and P.2 that we tackle in this article. P.1The first grounding sub-problem is to map the discrete symbolic action space to context-specific subgoals in the continuous state space. For instance, move object to target needs to be associated with a continuous sub-goal that specifies the desired metric target coordinates of the object. P.2The second grounding sub-problem is to map the subgoals to low-level context-specific action trajectories. For instance, the low-level trajectory for move object to target is specific to the continuous start-and targ...

show abstract

Section: Integrating Learning and Planningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving

2019

View full text Add to dashboard Cite

show abstract

“…Dua tim tersebut akan dipertandingkan dengan lima tim lain yang merupakan peserta dari RoboCup. Secara keseluruhan tim yang bertanding dalam simulasi ada tujuh tim, yaitu Tesis, NewTesis, Brainstormers (University of Osnabrueck, Germany) [8], Helios (National Institute of Advanced Industrial Science and Technology, Japan), OxBlue (University of Oxford, UK) [9], OPU_hana (Osaka Prefecture, Japan), dan UvA_Trilearn (Universiteit Van Amsterdam, Holland) [10].…”

Section: Hasil Dan Pembahasanunclassified

Untitled

Tony¹

2012

Jurnal Ilmu Komputer dan Informasi

View full text Add to dashboard Cite

Pertandingan sepak bola antar robot merupakan salah satu tantangan dalam dunia robotika yang diselenggarakan untuk dapat lebih mengembangkan robotika dan kecerdasan buatan serta sebagai ajang bertukar ilmu bagi para peneliti di seluruh dunia. Hal ini mendorong penulis merancang sebuah strategi untuk pertandingan sepak bola antar robot. Strategi dibuat dengan menggunakan konsep koordinat untuk merepresentasikan posisi robot dalam lapangan. Kemudian strategi diuji dan dianalisis untuk menentukan kinerja strategi dalam berbagai situasi. Inter-robot soccer game is one of the challenges in the world of robotics that is held to develop robotics and artificial intelligence and as well as a forum for researchers to exchange knowledge across the world. This encouraged the authors to design a strategy for inter-robot soccer game. Strategies are made using the concept of coordinates to represent the robot position in the field. Then the strategy is tested and analyzed to determine the performance of strategies in different situations.

show abstract

“…Examples include evolutionary algorithms for gait optimization (Chernova and Veloso 2004; Röfer et al 2004) or optimization of team tactics (Nakashima et al 2005), unsupervised and supervised learning in computer vision tasks (Kaufmann et al 2004;Li et al 2003;Treptow and Zell 2004) and lower level control tasks (Oubbati et al 2005). RL methods have been used to learn cooperative behaviors in the simulation league (Ma et al 2008) as well as for real robots (Asada et al 1999) and to learn walking patterns on humanoid robots (Ogino et al 2004). Furthermore, Stone's keep-away-game is a popular standardized reinforcement learning problem derived from the simulation league (Stone et al 2005).…”

Section: Related Workmentioning

confidence: 99%

Reinforcement learning for robot soccer

et al. 2009

View full text Add to dashboard Cite

Batch reinforcement learning methods provide a powerful framework for learning efficiently and effectively in autonomous robots. The paper reviews some recent work of the authors aiming at the successful application of reinforcement learning in a challenging and complex domain. It discusses several variants of the general batch learning framework, particularly tailored to the use of multilayer perceptrons to approximate value functions over continuous state spaces. The batch learning framework is successfully used to learn crucial skills in our soccer-playing robots participating in the RoboCup competitions. This is demonstrated on three different case studies.

show abstract

Combining Policy Search with Planning in Multi-agent Cooperation

Cited by 7 publications

References 11 publications

From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving

From Semantics to Execution: Integrating Action Planning With Reinforcement Learning for Robotic Causal Problem-Solving

Untitled

Reinforcement learning for robot soccer

Contact Info

Product

Resources

About