2007
DOI: 10.1613/jair.2368
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man

Abstract: In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
63
0

Year Published

2008
2008
2018
2018

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 52 publications
(63 citation statements)
references
References 29 publications
(27 reference statements)
0
63
0
Order By: Relevance
“…As a reward comparison, the agent presented in [8] achieves 8186 points on average using CE optimisation of hand-coded rules and 6382 points using CE optimisation of random rules. However, note that CERRLA was designed to learn in any relational environment (resulting in a loss in performance), whereas the agent in [8] was designed specifically for playing Ms. Pac-Man.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…As a reward comparison, the agent presented in [8] achieves 8186 points on average using CE optimisation of hand-coded rules and 6382 points using CE optimisation of random rules. However, note that CERRLA was designed to learn in any relational environment (resulting in a loss in performance), whereas the agent in [8] was designed specifically for playing Ms. Pac-Man.…”
Section: Resultsmentioning
confidence: 99%
“…However, note that CERRLA was designed to learn in any relational environment (resulting in a loss in performance), whereas the agent in [8] was designed specifically for playing Ms. Pac-Man. Furthermore, the Ms. Pac-Man environments used are likely to be different in execution.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Another example for a policy parametrization is to use domain-specific building blocks, such as motor primitives, as it was done in [27] and [29] to optimize the gait of the AIBO quadrupedal robot. A different kind of policy representation is used in [50] to learn a policy for the game of Ms. Pac-Man; here the policy is represented by a list of domain-specific parameterized rules. Section 4 described a third option for the global optimization part: Gaussian process optimization [35,3].…”
Section: Direct Policy Searchmentioning
confidence: 99%
“…Recently, Szita and Lorincz [10] proposed a different approach to playing Ms. Pac-Man. The aim is to develop a simple rule-based policy, where rules are organized into action modules and a decision about which direction to move is made based on priorities assigned to the modules in the agent.…”
Section: Agents For Playing Pac-man and Ms Pac-manmentioning
confidence: 99%