Evolutionary Dynamics of Multi-Agent Learning: A Survey

Bloembergen, Daan; Tuyls, Karl; Hennes, Daniel; Kaisers, Michael

doi:10.1613/jair.4818

Cited by 214 publications

(184 citation statements)

References 65 publications

Supporting

Mentioning

177

Contrasting

Order By: Relevance

“…This system of coupled differential equations models the temporal dynamics of the populations' strategy profiles when they interact, and can be extended readily to the general K-wise interaction case (see Supplementary Material Section 5.2.2 for more details). The replicator dynamics provide useful insights into the micro-dynamical characteristics of games, revealing strategy flows, basins of attraction, and equilibria [34] when visualized on a trajectory plot over the strategy simplex (e.g, Fig. 4).…”

Section: Micro-model: Replicator Dynamicsmentioning

confidence: 99%

α-Rank: Multi-Agent Evaluation by Evolution

Omidshafiei¹,

Papadimitriou

Piliouras

et al. 2019

Sci Rep

Self Cite

122

View full text Add to dashboard Cite

We introduce α - Rank , a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov - Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). α -Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of the correspondence we establish to the dynamical MCC solution concept when the underlying evolutionary model’s ranking-intensity parameter, α , is chosen to be large, which exactly forms the basis of α -Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley’s Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our α -Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that not only provide an overarching and unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the α -Rank methodology. We illustrate the method in canonical games and empirically validate it in several domains, including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.

show abstract

Section: Micro-model: Replicator Dynamicsmentioning

confidence: 99%

α-Rank: Multi-Agent Evaluation by Evolution

Omidshafiei¹,

Papadimitriou

Piliouras

et al. 2019

Sci Rep

Self Cite

122

View full text Add to dashboard Cite

show abstract

“…For large β, imitation becomes increasingly deterministic. It is noteworthy, especially for those who are familiar with other learning literature, that this parameter plays a similar role as the temperature factor in Boltzmann exploration mechanism usually used in Reinforcement Learning to balance between exploitation and exploration [5]. Indeed, as exploration is introduced below, β balances between greedily mimicing more successful interaction partners and randomly switching to the alternatives available in the population.…”

Section: Evolutionary Dynamics In Finite Populationsmentioning

confidence: 99%

“…5 When facing FAKE players who commit but then do not contribute, COMP F can choose to take immediately the compensation as stated in the agreement thereby ceasing the group interaction for the rest of the commitment time (R − 1). Yet, the commitment player may see that although the expected number F was not attained, there is still sufficient participation to make it worthwhile to continue for the remaining rounds.…”

Section: Lenience In Long-term Commitmentsmentioning

confidence: 99%

Evolution of commitment and level of participation in public goods games

Pereira

Lenaerts

2016

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

Before engaging in a group venture agents may require commitments from other members in the group, and based on the level of acceptance (participation) they can then decide whether it is worthwhile joining the group effort. Here, we show in the context of public goods games and using stochastic evolutionary game theory modelling, which implies imitation and mutation dynamics, that arranging prior commitments while imposing a minimal participation when interacting in groups induces agents to behave cooperatively. Our analytical and numerical results show that if the cost of arranging the commitment is sufficiently small compared to the cost of cooperation, commitment arranging behavior is frequent, leading to a high level of cooperation in the population. Moreover, an optimal participation level emerges depending both on the dilemma at stake and on the cost of arranging the commitment. Namely, the harsher the common good dilemma is, and the costlier it becomes to arrange the commitment, the more participants should explicitly commit to the agreement to ensure the success of the joint venture. Furthermore, considering that commitment deals may last for more than one encounter, we show that commitment proposers can be lenient in case of short-term agreements, yet should be strict in case of long-term interactions. B The Anh Han

show abstract

“…ODHC mitigates dynamic changes in environments (Zeng et al, 2007), and is a hill climbingbased algorithm that explores new peaks where convergence of the search for a better solution is hastened. Evolutionary game theory was employed to gain insight into the environment dynamics in MARL systems (Bloembergen et al, 2015). (Trojanowski and Michalewicz, 1999) memory usage (Cobb, 1990) PS macromutation operator (Esquivel and Coello, 2004) local change discovery (Cui et al, 2009) ACO enhanced communication (Dréo and Siarry, 2006) memory usage (Mavrovouniotis and Yang, 2011) AIS dynamic clonal selection (Kim and Bentley, 2002) Other self-organization and reproduction (Annunziato et al, 2001) Bayesian optimization (Kobliha et al, 2006) …”

Section: Particle Swarm Optimizationmentioning

confidence: 99%

Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments

Marinescu

Dusparić

Clarke

2017

ACM Trans. Auton. Adapt. Syst.

View full text Add to dashboard Cite

Multi-agent reinforcement learning (MARL) is a widely researched technique for decentralised control in complex large-scale autonomous systems. Such systems often operate in environments that are continuously evolving and where agents’ actions are non-deterministic, so called inherently non-stationary environments. When there are inconsistent results for agents acting on such an environment, learning and adapting is challenging. In this article, we propose P-MARL, an approach that integrates prediction and pattern change detection abilities into MARL and thus minimises the effect of non-stationarity in the environment. The environment is modelled as a time-series, with future estimates provided using prediction techniques. Learning is based on the predicted environment behaviour, with agents employing this knowledge to improve their performance in realtime. We illustrate P-MARL’s performance in a real-world smart grid scenario, where the environment is heavily influenced by non-stationary power demand patterns from residential consumers. We evaluate P-MARL in three different situations, where agents’ action decisions are independent, simultaneous, and sequential. Results show that all methods outperform traditional MARL, with sequential P-MARL achieving best results.

show abstract

Evolutionary Dynamics of Multi-Agent Learning: A Survey

Cited by 214 publications

References 65 publications

α-Rank: Multi-Agent Evaluation by Evolution

α-Rank: Multi-Agent Evaluation by Evolution

Evolution of commitment and level of participation in public goods games

Prediction-Based Multi-Agent Reinforcement Learning in Inherently Non-Stationary Environments

Contact Info

Product

Resources

About