2009
DOI: 10.1007/978-3-642-02921-9_46
|View full text |Cite
|
Sign up to set email alerts
|

Combining Policy Search with Planning in Multi-agent Cooperation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0
1

Year Published

2009
2009
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 11 publications
0
6
0
1
Order By: Relevance
“…The methods based on Dyna and prioritized sweeping have not been demonstrated to address sparse rewards, which are required to map discrete high-level actions and states to low-level subgoal states in a scalable manner. Ma and Cameron (2009) present the policy search planning method, in which they extend the policy search GPOMDP (Baxter and Bartlett, 2001) towards the multi-agent domain of robotic soccer. Herein, they map symbolic plans to policies using an expert knowledge database.…”
Section: Integrating Learning and Planningmentioning
confidence: 99%
See 2 more Smart Citations
“…The methods based on Dyna and prioritized sweeping have not been demonstrated to address sparse rewards, which are required to map discrete high-level actions and states to low-level subgoal states in a scalable manner. Ma and Cameron (2009) present the policy search planning method, in which they extend the policy search GPOMDP (Baxter and Bartlett, 2001) towards the multi-agent domain of robotic soccer. Herein, they map symbolic plans to policies using an expert knowledge database.…”
Section: Integrating Learning and Planningmentioning
confidence: 99%
“…Existing approaches that integrate action planning with reinforcement learning have not been able to map subgoals to low-level motion trajectories for realistic continuous-space robotic applications (Grounds and Kudenko, 2005 ; Ma and Cameron, 2009 ) because they rely on a continuous dense reward signal that is proportional to manually defined metrics that estimate how well a problem has been solved (Ng et al, 1999 ). The manual definition of such metrics, also known as reward shaping , is a non-trivial problem itself because the semantic distance to a continuous goal is often not proportional to the metric distance.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Dua tim tersebut akan dipertandingkan dengan lima tim lain yang merupakan peserta dari RoboCup. Secara keseluruhan tim yang bertanding dalam simulasi ada tujuh tim, yaitu Tesis, NewTesis, Brainstormers (University of Osnabrueck, Germany) [8], Helios (National Institute of Advanced Industrial Science and Technology, Japan), OxBlue (University of Oxford, UK) [9], OPU_hana (Osaka Prefecture, Japan), dan UvA_Trilearn (Universiteit Van Amsterdam, Holland) [10].…”
Section: Hasil Dan Pembahasanunclassified
“…Examples include evolutionary algorithms for gait optimization (Chernova and Veloso 2004; Röfer et al 2004) or optimization of team tactics (Nakashima et al 2005), unsupervised and supervised learning in computer vision tasks (Kaufmann et al 2004;Li et al 2003;Treptow and Zell 2004) and lower level control tasks (Oubbati et al 2005). RL methods have been used to learn cooperative behaviors in the simulation league (Ma et al 2008) as well as for real robots (Asada et al 1999) and to learn walking patterns on humanoid robots (Ogino et al 2004). Furthermore, Stone's keep-away-game is a popular standardized reinforcement learning problem derived from the simulation league (Stone et al 2005).…”
Section: Related Workmentioning
confidence: 99%