Evolving Soccer Keepaway Players Through Task Decomposition

Whiteson, Shimon; Kohl, Nate; Miikkulainen, Risto; Stone, Peter

doi:10.1007/s10994-005-0460-9

Cited by 85 publications

(36 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, gradient-based subgoal discovery with FNNs or RNNs decomposes RL tasks into subtasks for RL submodules (Schmidhuber, 1991b;Schmidhuber and Wahnsiedler, 1992). Numerous alternative HRL techniques have been proposed (e.g., Ring, 1991Ring, , 1994Jameson, 1991;Tenenberg et al, 1993;Weiss, 1994;Moore and Atkeson, 1995;Precup et al, 1998;Dietterich, 2000b;Menache et al, 2002;Doya et al, 2002;Ghavamzadeh and Mahadevan, 2003;Barto and Mahadevan, 2003;Samejima et al, 2003;Bakker and Schmidhuber, 2004;Whiteson et al, 2005;Simsek and Barto, 2008). While HRL frameworks such as Feudal RL (Dayan and Hinton, 1993) and options (Sutton et al, 1999b;Barto et al, 2004;Singh et al, 2005) do not directly address the problem of automatic subgoal discovery, HQ-Learning (Wiering and Schmidhuber, 1998a) automatically decomposes POMDPs (Sec.…”

Section: Deep Hierarchical Rl (Hrl) and Subgoal Learning With Fnns Anmentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.LATEX source: http://www.idsia.ch/˜juergen/DeepLearning8Oct2014.tex Complete BIBTEX file (888 kB): http://www.idsia.ch/˜juergen/deep.bib Preface This is the preprint of an invited Deep Learning (DL) overview. One of its goals is to assign credit to those who contributed to the present state of the art. I acknowledge the limitations of attempting to achieve this goal. The DL research community itself may be viewed as a continually evolving, deep network of scientists who have influenced each other in complex ways. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond, sometimes using "local search" to follow citations of citations backwards in time. Since not all DL publications properly acknowledge earlier relevant work, additional global search strategies were employed, aided by consulting numerous neural network experts. As a result, the present preprint mostly consists of references. Nevertheless, through an expert selection bias I may have missed important work. A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century. For these reasons, this work should be viewed as merely a snapshot of an ongoing credit assignment process. To help improve it, please do not hesitate to send corrections and suggestions to juergen@idsia.ch.

show abstract

Section: Deep Hierarchical Rl (Hrl) and Subgoal Learning With Fnns Anmentioning

confidence: 99%

Deep learning in neural networks: An overview

2015

View full text Add to dashboard Cite

show abstract

“…The probable locations of the opponent team members were modeled. Further, Whiteson et al [15] worked on RoboCup Keepaway, where three players must keep the ball away from a fourth player. Implicit opponent models are needed in order to avoid the fourth player.…”

Section: Related Workmentioning

confidence: 99%

Evolving explicit opponent models in game playing

Lockett

Chen

Miikkulainen

2007

Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation

Self Cite

View full text Add to dashboard Cite

Opponent models are necessary in games where the game state is only partially known to the player, since the player must infer the state of the game based on the opponent's actions. This paper presents an architecture and a process for developing neural network game players that utilize explicit opponent models in order to improve game play against unseen opponents. The model is constructed as a mixture over a set of cardinal opponents, i.e. opponents that represent maximally distinct game strategies. The model is trained to estimate the likelihood that the opponent will make the same move as each of the cardinal opponents would in a given game situation. Experiments were performed in the game of Guess It, a simple game of imperfect information that has no optimal strategy for defeating specific opponents. Opponent modeling is therefore crucial to play this game well. Both opponent modeling and game-playing neural networks were trained using NeuroEvolution of Augmenting Topologies (NEAT). The results demonstrate that game-playing provided with the model outperform networks not provided with the model when played against the same previously unseen opponents. The "cardinal mixture" architecture therefore constitutes a promising approach for general and dynamic opponent modeling in gameplaying.

show abstract

“…Keepaway is an appealing platform for empirical comparisons because the performance of TD methods has already been established in previous studies [8,23]. While GAs have been applied to variations of Keepaway [7,26], they have never, to our knowledge, been applied to the task's benchmark version.…”

Section: The Benchmark Keepaway Taskmentioning

confidence: 99%

“…There are a wide variety of both GAs and TD methods in use today but in order to compare these different approaches empirically we must focus on specific instantiations. We use Sarsa and NEAT as representative methods because of their empirical success in the benchmark Keepaway task [23] or variations thereof [26].…”

Section: Introductionmentioning

confidence: 99%

Comparing evolutionary and temporal difference methods in a reinforcement learning domain

Taylor

Whiteson

Stone

2006

Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation

Self Cite

View full text Add to dashboard Cite

Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving reinforcement learning (RL) problems. However, since few rigorous empirical comparisons have been conducted, there are no general guidelines describing the methods' relative strengths and weaknesses. This paper presents the results of a detailed empirical comparison between a GA and a TD method in Keepaway, a standard RL benchmark domain based on robot soccer. In particular, we compare the performance of NEAT [19], a GA that evolves neural networks, with Sarsa [16, 17], a popular TD method. The results demonstrate that NEAT can learn better policies in this task, though it requires more evaluations to do so. Additional experiments in two variations of Keepaway demonstrate that Sarsa learns better policies when the task is fully observable and NEAT learns faster when the task is deterministic. Together, these results help isolate the factors critical to the performance of each method and yield insights into their general strengths and weaknesses.

show abstract

Evolving Soccer Keepaway Players Through Task Decomposition

Cited by 85 publications

References 28 publications

Deep learning in neural networks: An overview

Deep learning in neural networks: An overview

Evolving explicit opponent models in game playing

Comparing evolutionary and temporal difference methods in a reinforcement learning domain

Contact Info

Product

Resources

About