Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man

Szita, István; Lörincz, András

doi:10.1613/jair.2368

Cited by 52 publications

(63 citation statements)

References 29 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a reward comparison, the agent presented in [8] achieves 8186 points on average using CE optimisation of hand-coded rules and 6382 points using CE optimisation of random rules. However, note that CERRLA was designed to learn in any relational environment (resulting in a loss in performance), whereas the agent in [8] was designed specifically for playing Ms. Pac-Man.…”

Section: Resultsmentioning

confidence: 99%

“…However, note that CERRLA was designed to learn in any relational environment (resulting in a loss in performance), whereas the agent in [8] was designed specifically for playing Ms. Pac-Man. Furthermore, the Ms. Pac-Man environments used are likely to be different in execution.…”

Section: Resultsmentioning

confidence: 99%

“…The crossentropy method has previously been successfully applied to both Tetris [9], [10] and Ms. Pac-Man [8], though in each case the learning algorithm was optimised to match the environment.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Using the online cross-entropy method to learn relational policies for playing different games

Sarjant

Pfahringer

Driessens

et al. 2011

2011 IEEE Conference on Computational Intelligence and Games (CIG'11)

View full text Add to dashboard Cite

Abstract-By defining a video-game environment as a collection of objects, relations, actions and rewards, the relational reinforcement learning algorithm presented in this paper generates and optimises a set of concise, human-readable relational rules for achieving maximal reward. Rule learning is achieved using a combination of incremental specialisation of rules and a modified online cross-entropy method, which dynamically adjusts the rate of learning as the agent progresses. The algorithm is tested on the Ms. Pac-Man and Mario environments, with results indicating the agent learns an effective policy for acting within each environment.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Using the online cross-entropy method to learn relational policies for playing different games

Sarjant

Pfahringer

Driessens

et al. 2011

2011 IEEE Conference on Computational Intelligence and Games (CIG'11)

View full text Add to dashboard Cite

show abstract

“…Another example for a policy parametrization is to use domain-specific building blocks, such as motor primitives, as it was done in [27] and [29] to optimize the gait of the AIBO quadrupedal robot. A different kind of policy representation is used in [50] to learn a policy for the game of Ms. Pac-Man; here the policy is represented by a list of domain-specific parameterized rules. Section 4 described a third option for the global optimization part: Gaussian process optimization [35,3].…”

Section: Direct Policy Searchmentioning

confidence: 99%

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

Jung

Wehenkel

Ernst

et al. 2013

Adaptive Control & Signal

View full text Add to dashboard Cite

Direct policy search (DPS) and look-ahead tree (LT) policies are two widely used classes of techniques to produce high performance policies for sequential decision-making problems. To make DPS approaches work well, one crucial issue is to select an appropriate space of parameterized policies with respect to the targeted problem. A fundamental issue in LT approaches is that, to take good decisions, such policies must develop very large look-ahead trees which may require excessive online computational resources. In this paper, we propose a new hybrid policy learning scheme that lies at the intersection of DPS and LT, in which the policy is an algorithm that develops a small look-ahead tree in a directed way, guided by a node scoring function that is learned through DPS. The LT-based representation is shown to be a versatile way of representing policies in a DPS scheme, while at the same time, DPS enables to significantly reduce the size of the look-ahead trees that are required to take high-quality decisions.We experimentally compare our method with two other state-of-the-art DPS techniques and four common LT policies on four benchmark domains and show that it combines the advantages of the two techniques from which it originates. In particular, we show that our method: (1) produces overall better performing policies than both pure DPS and pure LT policies, (2) requires a substantially smaller number of policy evaluations than other DPS techniques, (3) is easy to tune and (4) results in policies that are quite robust with respect to perturbations of the initial conditions.

show abstract

“…Recently, Szita and Lorincz [10] proposed a different approach to playing Ms. Pac-Man. The aim is to develop a simple rule-based policy, where rules are organized into action modules and a decision about which direction to move is made based on priorities assigned to the modules in the agent.…”

Section: Agents For Playing Pac-man and Ms Pac-manmentioning

confidence: 99%

An influence map model for playing Ms. Pac-Man

Wirth

Gallagher

2008

2008 IEEE Symposium on Computational Intelligence and Games

View full text Add to dashboard Cite

Abstract-In this paper we develop a Ms. Pac-Man playing agent based on an influence map model. The proposed model is as simple as possible while capturing the essentials of the game. Our model has three main parameters that have an intuitive relationship to the agent's behavior. Experimental results are presented exploring the model's performance over its parameter space using random and systematic global exploration and a greedy algorithm. The model parameters can be optimized without difficulty despite the noisy fitness function used. The performance of the optimized agents is comparable to the best published results for a Ms. Pac-Man playing agent. Nevertheless, some difficulties were observed in terms of the model and the software system.

show abstract

Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man

Cited by 52 publications

References 29 publications

Using the online cross-entropy method to learn relational policies for playing different games

Using the online cross-entropy method to learn relational policies for playing different games

Optimized look‐ahead tree policies: a bridge between look‐ahead tree policies and direct policy search

An influence map model for playing Ms. Pac-Man

Contact Info

Product

Resources

About