A hybrid search agent in pommerman

Zhou, Hongwei; Gong, Yichen; Mugrai, Luvneesh; Khalifa, Ahmed; Nealen, Andy; Togelius, Julian

doi:10.1145/3235765.3235812

Cited by 19 publications

(14 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Tree-based techniques such as Monte Carlo Tree Search (MCTS) have been shown to perform well in Pommerman Resnick et al (2020); Osogami and Takahashi (2019); Zhou et al (2018). However, they require much more computational infrastructure than pure RL, in addition to significant human effort for evaluating the trajectories.…”

Section: Related Workmentioning

confidence: 99%

School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

Shelke,

Meisheri,

Khadilkar

2021

Preprint

View full text Add to dashboard Cite

Pommerman is a hybrid cooperative/adversarial multi-agent environment, with challenging characteristics in terms of partial observability, limited or no communication, sparse and delayed rewards, and restrictive computational time limits. This makes it a challenging environment for reinforcement learning (RL) approaches. In this paper, we focus on developing a curriculum for learning a robust and promising policy in a constrained computational budget of 100,000 games, starting from a fixed base policy (which is itself trained to imitate a noisy expert policy). All RL algorithms starting from the base policy use vanilla proximal-policy optimization (PPO) with the same reward function, and the only difference between their training is the mix and sequence of opponent policies. One expects that beginning training with simpler opponents and then gradually increasing the opponent difficulty will facilitate faster learning, leading to more robust policies compared against a baseline where all available opponent policies are introduced from the start. We test this hypothesis and show that within constrained computational budgets, it is in fact better to "learn in the school of hard knocks", i.e., against all available opponent policies nearly from the start. We also include ablation studies where we study the effect of modifying the base environment properties of ammo and bomb blast strength on the agent performance.

show abstract

Section: Related Workmentioning

confidence: 99%

School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

Shelke,

Meisheri,

Khadilkar

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The unique challenges in Pommerman have attracted many researchers to this environment. Their approaches can be broadly categorized into model-free RL [16,17,13,14,18] and tree-searchbased-RL [19,20,11,21,12]. In addition, [22] is an excellent review of Pommerman, its practical implications, and its limitations.…”

Section: Related Workmentioning

confidence: 99%

“…In addition, [22] is an excellent review of Pommerman, its practical implications, and its limitations. A comparison of search techniques including MCTS, breadth-first, and flat Monte Carlo [20] shows that in the fully observable FFA mode, MCTS is able to beat simpler and hand-crafted solutions. An extension of this study [19] called Rolling Horizon Evolutionary Algorithm (RHEA) concludes that the more offensive strategies (like RHEA with a high rate of bomb placing) are normally also riskier, due to inadvertent suicides 3 .…”

Section: Related Workmentioning

confidence: 99%

Accelerating Training in Pommerman with Imitation and Reinforcement Learning

Meisheri,

Shelke,

Verma

et al. 2019

Preprint

View full text Add to dashboard Cite

The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting. We focus on the 2×2 team version of Pommerman, developed for a competition at NeurIPS 2018 1 . Our methodology involves training an agent initially through imitation learning on a noisy expert policy, followed by a proximal-policy optimization (PPO) reinforcement learning algorithm. The basic PPO approach is modified for stable transition from the imitation learning phase through reward shaping, action filters based on heuristics, and curriculum learning. The proposed methodology is able to beat heuristic and pure reinforcement learning baselines with a combined 100,000 training games, significantly faster than other non-tree-search methods in literature. We present results against multiple agents provided by the developers of the simulation, including some that we have enhanced. We include a sensitivity analysis over different parameters, and highlight undesirable effects of some strategies that initially appear promising. Since Pommerman is a complex multi-agent competitive environment, the strategies developed here provide insights into several real-world problems with characteristics such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards.

show abstract

“…Their experiment in Pommerman shows that Backplay provides significant gains in sample complexity with a stark advantage in sparse reward setting. Hybrid Search Agent [25] focused on search-based methods in Pommerman with resource-intensive forward models. Their result shows that heuristic agent using depth-limited tree search can slightly outperform hand-made heuristics.…”

Section: Action Spacementioning

confidence: 99%

Continual Match Based Training in Pommerman: Technical Report

Peng,

Pang,

Yuan

et al. 2018

Preprint

View full text Add to dashboard Cite

Continual learning is the ability of agents to improve their capacities throughout multiple tasks continually. While recent works in the literature of continual learning mostly focused on developing either particular loss functions or specialized structures of neural network explaining the episodic memory or neural plasticity, we study continual learning from the perspective of the training mechanism. Specifically, we propose a COnitnual Match BAsed Training (COMBAT) framework for training a population of advantage-actor-critic (A2C) agents in Pommerman, a partially observable multi-agent environment with no communication.Following the COMBAT framework, we trained an agent, namely, Navocado, that won the title of the top 1 learning agent in the NeurIPS 2018 Pommerman Competition. Two critical features of our agent are worth mentioning. Firstly, our agent did not learn from any demonstrations. Secondly, our agent is highly reproducible. As a technical report, we articulate the design of state space, action space, reward, and most importantly, the COMBAT framework for our Pommerman agent. We show in the experiments that Pommerman is a perfect environment for studying continual learning, and the agent can improve its performance by continually learning new skills without forgetting the old ones. Finally, the result in the Pommerman Competition verifies the robustness of our agent when competing with various opponents.

show abstract

A hybrid search agent in pommerman

Cited by 19 publications

References 5 publications

School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

School of hard knocks: Curriculum analysis for Pommerman with a fixed computational budget

Accelerating Training in Pommerman with Imitation and Reinforcement Learning

Continual Match Based Training in Pommerman: Technical Report

Contact Info

Product

Resources

About