Abstract:Abstract-We consider multi-player repeated games involving a large number of players with large strategy spaces and enmeshed utility structures. In these "large-scale" games, players are inherently faced with limitations in both their observational and computational capabilities. Accordingly, players in large-scale games need to make their decisions using algorithms that accommodate limitations in information gathering and processing. This disqualifies some of the well known decision making models such as "Fic… Show more
“…With a broad set of existing results for learning in potential games (Arslan & Shamma, 2004;Fudenberg & Levine, 1998;Marden, Arslan, & Shamma, 2009bMarden, Young, Arslan, & Shamma, 2009;Shamma & Arslan, 2005;Young, 1998Young, , 2005Young, , 1993, the primary focus of this work is on the development of methodologies for designing the interaction framework as a potential game while meeting constraints and objectives relevant to multiagent systems, e.g., locality of agent objective functions, and efficiency guarantees for resulting equilibria, among many others. Unfortunately, the framework of potential games is not broad enough to meet this diverse set of challenges as several limitations are beginning to emerge.…”
“…With a broad set of existing results for learning in potential games (Arslan & Shamma, 2004;Fudenberg & Levine, 1998;Marden, Arslan, & Shamma, 2009bMarden, Young, Arslan, & Shamma, 2009;Shamma & Arslan, 2005;Young, 1998Young, , 2005Young, , 1993, the primary focus of this work is on the development of methodologies for designing the interaction framework as a potential game while meeting constraints and objectives relevant to multiagent systems, e.g., locality of agent objective functions, and efficiency guarantees for resulting equilibria, among many others. Unfortunately, the framework of potential games is not broad enough to meet this diverse set of challenges as several limitations are beginning to emerge.…”
“…Although forms of joint strategy fictitious play using each belief update have been proven to converge to Nash equilibria in potential games [11], here we focus on one specific variant with fading memory and inertia.…”
“…For a generic potential game Γ, by Theorem 3.1 of [11], fading memory JSFP with inertia converges almost surely to a pure Nash equilibrium, as long as 0 < ξ < 1.…”
Section: Then the Agent Continues To Play A T I = A T−1 I ; Otherwimentioning
confidence: 99%
“…Based on this information, the individuals update their estimates of the reward functions (line 6) and update their internal states according to their action adaptation process (line 7). A learning parameter ε is set according to the learning policy in use (line 8), and a new action for the day is selected according to either the unperturbed process (lines 9-10) or randomly sampled in order to explore the joint action space (lines [11][12]. For these games, we wish to have behaviour converge to an equilibrium, thereby providing a distributed method of computing (locally) optimal joint strategies with only noisy evaluations of the global reward function.…”
Section: Algorithmsmentioning
confidence: 99%
“…This problem is addressed by the literature on learning in games; the dynamics of learning processes in repeated games is a well investigated branch of game theory (see [6], for example). In particular, the results that are relevant to this work are the guaranteed convergence to Nash equilibrium in potential games of a variety of action adaptation processes, including finite-memory better reply processes [8], adaptive play [9], joint-strategy fictitious play [10], fading-memory regret monitoring [11], and generalised weakened fictitious play [7]; we also include in our investigation regret-matching [23], which converges to the set of correlated equilibria. Thus, a decentralised solution to an optimisation problem can be found by, first, constructing a potential game from the optimisation problem, and then using one of these algorithms to compute an equilibrium.…”
This paper demonstrates a decentralised method for optimisation using game-theoretic multi-agent techniques, applied to a sensor network management problem. Our first major contribution is to show how the marginal contribution utility design is used to construct a unknown-reward potential game formulation of the problem. This formulation exploits the sparse structure of sensor network problems, and allows us to apply a bound to the price of anarchy of the Nash equilibria of the induced game. Furthermore, since the game is a potential game, solutions can be found using multiagent learning techniques. The techniques we derive use Q-learning to estimate an agent's rewards, while an action adaptation process responds to an agent's opponents' behaviour. However, there are many different algorithmic configurations that could be used to solve these games. Thus, our second major contribution is an extensive evaluation of several action adaptation processes. Specifically, we compare six algorithms across a variety of parameter settings to ascertain the quality of the solutions they produce, their speed of convergence, and their robustness to pre-specified parameter choices. Our results show that they each perform similarly across a wide range of parameters. There is, however, a significant effect from moving to a learning policy with sampling probabilities that go to zero too quickly for rewards to be accurately estimated.
We consider in this chapter a class of two-player nonzero-sum stochastic games with incomplete information, which is inspired by recent applications of game theory in network security. We develop fully distributed reinforcement learning algorithms, which require for each player a minimal amount of information regarding the other a player is in an active mode, she updates her strategy and estimates of unknown quantities using a specific pure or hybrid learning pattern. The players' intelligence and rationality are captured by the weighted linear combination of different learning patterns. We use stochastic approximation techniques to show that, under appropriate conditions, the pure or hybrid learning schemes with random updates can be studied using their deterministic ordinary differential equation (ODE) counterparts.Convergence to state-independent equilibria is analyzed for special classes of games, namely, games with two actions, and potential games. Results are applied to network security games between an intruder and an administrator, where the noncooperative behaviors are well characterized by the features of distributed hybrid learning.
INTRODUCTIONIn recent years, game-theoretic methods have been applied to study resource allo- In this chapter, we consider a class of two-player nonzero-sum stochastic games with incomplete information. We develop fully distributed payoff and strategy reinforcement learning (CODIPAS-RL) algorithms, which require for each player a minimal amount of information regarding the other player. At each time, each player can be in an active mode or in a sleep mode. If a player is in an active mode, she updates her strategy and estimates of unknown quantities using a specific pure or hybrid learning pattern. In contrast to the standard reinforcement learning algorithms which focus only on either strategy or payoff reinforcement for the equilibrium learning,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.