The StarCraft Multi-Agent Challenge

Samvelyan, Mikayel; Rashid, Tabish; Witt, Christian Schroeder de; Farquhar, Gregory; Nardelli, Nantas; Rudner, Tim G. J.; Hung, Chia-Man; Torr, Philip H. S.; Foerster, Jakob; Whiteson, Shimon

doi:10.48550/arxiv.1902.04043

Cited by 72 publications

(143 citation statements)

References 23 publications

Supporting

Mentioning

139

Contrasting

Order By: Relevance

“…We exclude some of the more complex and popular team competition games, e.g. Google Football Environment [72], StarCraft 2 [73] etc. because those are too heavy on computational resources as well as it is more complicated to analyze and differentiate the effects of various incentives.…”

Section: Marl Under Team Competitionmentioning

confidence: 99%

Offsetting Unequal Competition through RL-assisted Incentive Schemes

Koley¹,

Maiti²,

Bhattacharya³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper investigates the dynamics of competition among organizations with unequal expertise. Multi-agent reinforcement learning has been used to simulate and understand the impact of various incentive schemes designed to offset such inequality. We design Touch-Mark, a game based on wellknown multi-agent-particle-environment, where two teams (weak, strong) with unequal but changing skill levels compete against each other. For training such a game, we propose a novel controller assisted multi-agent reinforcement learning algorithm C-MADDPG which empowers each agent with an ensemble of policies along with a supervised controller that by selectively partitioning the sample space, triggers intelligent role division among the teammates. Using C-MADDPG as an underlying framework, we propose an incentive scheme for the weak team such that the final rewards of both teams become the same. We find that in spite of the incentive, the final reward of the weak team falls short of the strong team. On inspecting, we realize that an overall incentive scheme for the weak team does not incentivize the weaker agents within that team to learn and improve. To offset this, we now specially incentivize the weaker player to learn and as a result, observe that the weak team beyond an initial phase performs at par with the stronger team. The final goal of the paper has been to formulate a dynamic incentive scheme that continuously balances the reward of the two teams. This is achieved by devising an incentive scheme enriched with an RL agent which takes minimum information from the environment.

show abstract

Section: Marl Under Team Competitionmentioning

confidence: 99%

Offsetting Unequal Competition through RL-assisted Incentive Schemes

Koley¹,

Maiti²,

Bhattacharya³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Prior work has proposed a number of benchmarks for reinforcement learning, which are often either explicitly episodic (Todorov et al, 2012;Beattie et al, 2016;Chevalier-Boisvert et al, 2018), or consist of games that are implicitly episodic after the player dies or completes the game (Bellemare et al, 2013;Silver et al, 2016). In addition, RL benchmarks have been proposed in the episodic setting for studying a number of orthogonal questions, such multi-task learning (Bellemare et al, 2013;Yu et al, 2020), sequential task learning (Wołczyk et al, 2021), generalization (Cobbe et al, 2020), and multi-agent learning (Samvelyan et al, 2019;. These benchmarks differ from our own in that we propose to study the challenge of autonomy.…”

Section: Related Workmentioning

confidence: 99%

Autonomous Reinforcement Learning: Formalism and Benchmarking

Sharma¹,

Xu²,

Sardana³

et al. 2021

Preprint

View full text Add to dashboard Cite

Reinforcement learning (RL) provides a naturalistic framing for learning through trial and error, which is appealing both because of its simplicity and effectiveness and because of its resemblance to how humans and animals acquire skills through experience. However, real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world, whereas common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms, such as robots. In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials. We introduce a simulated benchmark EARL 1 around this framework, containing a set of diverse and challenging simulated tasks reflective of the hurdles introduced to learning when only a minimal reliance on extrinsic intervention can be assumed. We show that standard approaches to episodic RL and existing approaches struggle as interventions are minimized, underscoring the need for developing new algorithms for reinforcement learning with a greater focus on autonomy.

show abstract

“…Modern machine-learning algorithms based on deep-neural nets are able to play a large variety of distinct games [69], such as Go, chess and Starcraft, or console games like Atari. We consider a setup where the opponents may be either human players that are drawn from a standard internet-based matchmaking system, standalone competing algorithms, or agents participating in a multiagent challenge setup [70]. Of minor relevance to the question at hand is the expertise level of the architecture and whether game-specific algorithms are used.…”

Section: Multi-gaming Environmentsmentioning

confidence: 99%

Emotions as abstract evaluation criteria in biological and artificial intelligences

Gros¹

2021

Preprint

View full text Add to dashboard Cite

Biological as well as advanced artificial intelligences (AIs) need to decide which goals to pursue. We review nature's solution to the time allocation problem, which is based on a continuously readjusted categorical weighting mechanism we experience introspectively as emotions. One observes phylogenetically that the available number of emotional states increases hand in hand with the cognitive capabilities of animals and that raising levels of intelligence entail ever larger sets of behavioral options. Our ability to experience a multitude of potentially conflicting feelings is in this view not a leftover of a more primitive heritage, but a generic mechanism for attributing values to behavioral options that can not be specified at birth. In this view, emotions are essential for understanding the mind.For concreteness, we propose and discuss a framework which mimics emotions on a functional level. Based on time allocation via emotional stationarity (TAES), emotions are implemented as abstract criteria, such as satisfaction, challenge and boredom, which serve to evaluate activities that have been carried out. The resulting timeline of experienced emotions is compared with the 'character' of the agent, which is defined in terms of a preferred distribution of emotional states. The long-term goal of the agent, to align experience with character, is achieved by optimizing the frequency for selecting individual tasks. Upon optimization, the statistics of emotion experience becomes stationary.

show abstract

The StarCraft Multi-Agent Challenge

Cited by 72 publications

References 23 publications

Offsetting Unequal Competition through RL-assisted Incentive Schemes

Offsetting Unequal Competition through RL-assisted Incentive Schemes

Autonomous Reinforcement Learning: Formalism and Benchmarking

Emotions as abstract evaluation criteria in biological and artificial intelligences

Contact Info

Product

Resources

About