Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information

Sastry, P. S.; Phansalkar, V.V.; Thathachar, M. A. L.

doi:10.1109/21.293490

Cited by 272 publications

(197 citation statements)

References 16 publications

(4 reference statements)

Supporting

Mentioning

194

Contrasting

Order By: Relevance

“…In zero-sum games, the L R-I scheme converges to the equilibrium point if it exists in pure strategies, while the L ReP scheme can arbitrarily close approach a mixed equilibrium (Lakshmivarahan & Narendra, 1981). In general non zero-sum games it is shown that when the automata use a L R-I scheme and the game is such that a unique pure equilibrium point exists, convergence is guaranteed (Sastry et al, 1994). In cases where the game matrix has more than one pure equilibrium, which equilibrium is found depends on the initial conditions.…”

Section: Learning Automata Gamesmentioning

confidence: 99%

“…Wheeler et al have shown that a set of decentralized learning automata is able to control a finite Markov Chain with unknown transition probabilities and rewards (Wheeler & Narendra, 1986). In Sastry, Phansalkar, and Thathachar (1994) it is shown that a team of learning automata involved in a general N-person stochastic game converges to Nash equilibrium if each of the team members makes use of a linear learning algorithm called L R-I algorithms. Nowe and Verbeeck (2002) first introduced the use of interconnected learning automata as a model for stigmergetic communication in multi-agent system to solve MMDPs.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Masoumi

Meybodi

2011

Expert Systems with Applications

View full text Add to dashboard Cite

a b s t r a c tLearning automata (LA) were recently shown to be valuable tools for designing Multi-Agent Reinforcement Learning algorithms and are able to control the stochastic games. In this paper, the concepts of stigmergy and entropy are imported into learning automata based multi-agent systems with the purpose of providing a simple framework for interaction and coordination in multi-agent systems and speeding up the learning process. The multi-agent system considered in this paper is designed to find optimal policies in Markov games. We consider several dummy agents that walk around in the states of the environment, make local learning automaton active, and bring information so that the involved learning automaton can update their local state. The entropy of the probability vector for the learning automata of the next state is used to determine reward or penalty for the actions of learning automata. The experimental results have shown that in terms of the speed of reaching the optimal policy, the proposed algorithm has better learning performance than other learning algorithms.

show abstract

Section: Learning Automata Gamesmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Masoumi

Meybodi

2011

Expert Systems with Applications

View full text Add to dashboard Cite

show abstract

“…If the cluster accepts a foreign load which extends its queue length to 5, in average half of the local jobs will be delayed. We conclude that the adjustment algorithm is too straightforward, thus the clusters should use more sophisticated techniques for learning their optimal participation level, similar to ones used in other games with incomplete information, such as [31] or genetic algorithms [28].…”

Section: Experimental Analysismentioning

confidence: 99%

Promoting cooperation in selfish computational grids

Rządca¹,

Trystram²

2009

European Journal of Operational Research

View full text Add to dashboard Cite

In distributed computing the recent paradigm shift from centrallyowned clusters to organizationally distributed computational grids introduces a number of new challenges in resource management and scheduling. In this work, we study the problem of Selfish Load Balancing which extends the well-known Load Balancing (LB) problem to scenarios in which each processor is concerned only with the performance of its local jobs. We propose a simple mathematical model for such systems and a novel function for computing the cost of the execution of foreign jobs. Then, we use the game-theoretic framework to analyze the model in order to compute the expected result of LB performed in a grid formed by two clusters. We show that, firstly, LB is a socially-optimal strategy, and secondly, for similarly loaded clusters, it is sufficient to collaborate during longer time periods in order to make LB the dominant strategy for each cluster. However, we show that if we allow clusters to make decisions depending on their current queue length, LB will never be performed. Then, we propose a LB algorithm which balances the load more equitably, even in the presence of overloaded clusters. Our algorithms do not use any external forms of compensation (such as money). The load is balanced 1 only by considering the parameters of execution of jobs. This analysis is assessed experimentally by simulation, involving scenarios with multiple clusters and heterogeneous load.

show abstract

“…Among other approaches, including those based on reinforcement learning, maximum-entropy reinforcement learning, smoothed best-response or fictitious play, it is important to highlight the contributions in [3], [7], [8], [18]- [23]. The main drawbacks of these contributions can be summarized in five points: (i) The converging point is a probability distribution over the set of all available channels and power allocations policies [21], [22], [30], [31]. Therefore, the optimization is often on the expectation of the performance metric and the optimality is often claimed in the asymptotic regime.…”

Section: A State Of the Artmentioning

confidence: 99%

Self-Organization in Decentralized Networks: A Trial and Error Learning Approach

Rose

Perlaza

Martret

et al. 2014

IEEE Trans. Wireless Commun.

View full text Add to dashboard Cite

Abstract-In this paper, the problem of channel selection and power control is jointly analyzed in the context of multiplechannel clustered ad-hoc networks, i.e., decentralized networks in which radio devices are arranged into groups (clusters) and each cluster is managed by a central controller (CC). This problem is modeled by game in normal form in which the corresponding utility functions are designed for making some of the Nash equilibria (NE) to coincide with the solutions to a global network optimization problem. In order to ensure that the network operates in the equilibria that are globally optimal, a learning algorithm based on the paradigm of trial and error learning is proposed. These results are presented in the most general form and therefore, they can also be seen as a framework for designing both games and learning algorithms with which decentralized networks can operate at global optimal points using only their available local knowledge. The pertinence of the game design and the learning algorithm are highlighted using specific scenarios in decentralized clustered ad hoc networks. Numerical results confirm the relevance of using appropriate utility functions and trial and error learning for enhancing the performance of decentralized networks.

show abstract

Decentralized learning of Nash equilibria in multi-person stochastic games with incomplete information

Cited by 272 publications

References 16 publications

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Promoting cooperation in selfish computational grids

Self-Organization in Decentralized Networks: A Trial and Error Learning Approach

Contact Info

Product

Resources

About