Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics

Hussain, Aamal; Belardinelli, Francesco; Piliouras, Georgios

doi:10.24963/ijcai.2023/16

Cited by 2 publications

(6 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…so that each agent has N 0 = N − 1 neighbours. This corresponds also to the case analysed by (Sanders, Farmer, and Galla 2018) and (Hussain, Belardinelli, and Piliouras 2023) in which it was predicted that the boundary between stable and unstable learning dynamics is impacted by the total number of agents.…”

Section: Methodssupporting

confidence: 57%

“…In particular, the region in which learning converges to fixed point seems to vanish as the number of agents increases. This result is supported by that of (Hussain, Belardinelli, and Piliouras 2023) in which a lower bound on exploration rates was determined so that Q-Learning dynamics converge to a unique equilibrium. Again it was shown that this lower bound increases with the number of agents.…”

Section: Model and Contributionsmentioning

confidence: 52%

“…In their work, it was found that complex dynamics (cycles and chaos) becomes more prominent as the number of players in the game increases. This is similarly true of (Hussain, Belardinelli, and Piliouras 2023) in which (QLD) was analysed in arbitrary games, without imposing any structure on interactions between agents. The authors similarly concluded that, as the number of agents increases, higher exploration rates T are required to ensure convergence to a QRE.…”

Section: Learning Modelmentioning

confidence: 81%

“…In fact, recent work has shown that chaotic dynamics occur in games even slightly perturbed from the zero-sum setting (Galla and Farmer 2013;Sato, Akiyama, and Farmer 2002). In addition, recent work (Hussain, Belardinelli, and Piliouras 2023;Sanders, Farmer, and Galla 2018) has shown that the ability of learning dynamics to reach an equilibrium with low exploration rates diminishes as the number of agents increases. These technical challenges present a strong barrier towards ensuring the convergence of learning in competitive games with many players.…”

Section: Introductionmentioning

confidence: 99%

“…However, in both (Hussain, Belardinelli, and Piliouras 2023) and (Sanders, Farmer, and Galla 2018) it was assumed that all agents are directly influenced by all other agents in the environment. In practice, however, this does not hold.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos

Hussain,

Belardinelli

2024

AAAI

View full text Add to dashboard Cite

The behaviour of multi agent learning in competitive network games is often studied within the context of zero sum games, in which convergence guarantees may be obtained. However, outside of this class the behaviour of learning is known to display complex behaviours and convergence cannot be always guaranteed. Nonetheless, in order to develop a complete picture of the behaviour of multi agent learning in competitive settings, the zero sum assumption must be lifted. Motivated by this we study the Q Learning dynamics, a popular model of exploration and exploitation in multi agent learning, in competitive network games. We determine how the degree of competition, exploration rate and network connectivity impact the convergence of Q Learning. To study generic competitive games, we parameterise network games in terms of correlations between agent payoffs and study the average behaviour of the Q Learning dynamics across all games drawn from a choice of this parameter. This statistical approach establishes choices of parameters for which Q Learning dynamics converge to a stable fixed point. Differently to previous works, we find that the stability of Q Learning is explicitly dependent only on the network connectivity rather than the total number of agents. Our experiments validate these findings and show that, under certain network structures, the total number of agents can be increased without increasing the likelihood of unstable or chaotic behaviours.

show abstract

Section: Methodssupporting

confidence: 57%

Section: Model and Contributionsmentioning

confidence: 52%

Section: Learning Modelmentioning

confidence: 81%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos

Hussain,

Belardinelli

2024

AAAI

View full text Add to dashboard Cite

show abstract

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property

Anagnostides,

Panageas,

Farina

et al. 2024

AAAI

View full text Add to dashboard Cite

Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning. Their theoretical understanding in multiagent settings, however, remains limited, especially beyond two-player competitive and potential Markov games. In this paper, we develop a new framework to characterize optimistic policy gradient methods in multi-player Markov games with a single controller. Specifically, under the further assumption that the game exhibits an equilibrium collapse, in that the marginals of coarse correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to stationary epsilon-NE in O(1/epsilon^2) iterations, where O suppresses polynomial factors in the natural parameters of the game. Such an equilibrium collapse is well-known to manifest itself in two-player zero-sum Markov games, but also occurs even in a class of multi-player Markov games with separable interactions, as established by recent work. As a result, we bypass known complexity barriers for computing stationary NE when either of our assumptions fails. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.

show abstract

Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics

Cited by 2 publications

References 0 publications

Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos

Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence beyond the Minty Property

Contact Info

Product

Resources

About