Jeff S. Shamma scite author profile

Abstract-We present a view of cooperative control using the language of learning in games. We review the game-theoretic concepts of potential and weakly acyclic games, and demonstrate how several cooperative control problems, such as consensus and dynamic sensor coverage, can be formulated in these settings. Motivated by this connection, we build upon game-theoretic concepts to better accommodate a broader class of cooperative control problems. In particular, we extend existing learning algorithms to accommodate restricted action sets caused by the limitations of agent capabilities and group-based decision making. Furthermore, we also introduce a new class of games called sometimes weakly acyclic games for time-varying objective functions and action sets, and provide distributed algorithms for convergence to an equilibrium.

show abstract

Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation

Marden

Shamma

2010

134

306

View full text Add to dashboard Cite

Log-linear learning is a learning algorithm with equilibrium selection properties. Log-linear learning provides guarantees on the percentage of time that the joint action profile will be at a potential maximizer in potential games. The traditional analysis of log-linear learning has centered around explicitly computing the stationary distribution. This analysis relied on a highly structured setting: i) players' utility functions constitute a potential game, ii) players update their strategies one at a time, which we refer to as asynchrony, iii) at any stage, a player can select any action in the action set, which we refer to as completeness, and iv) each player is endowed with the ability to assess the utility he would have received for any alternative action provided that the actions of all other players remain fixed. Since the appeal of log-linear learning is not solely the explicit form of the stationary distribution, we seek to address to what degree one can relax the structural assumptions while maintaining that only potential function maximizers are the stochastically stable action profiles. In this paper, we introduce slight variants of log-linear learning to include both synchronous updates and incomplete action sets. In both settings, we prove that only potential function maximizers are stochastically stable. Furthermore, we introduce a payoff-based version of log-linear learning, in which players are only aware of the utility they received and the action that they played. Note that log-linear learning in its original form is not a payoff-based learning algorithm. In payoff-based log-linear learning, we also prove that only potential maximizers are stochastically stable. The key enabler for these results is to change the focus of the analysis away from deriving the explicit form of the stationary distribution of the learning process towards characterizing the stochastically stable states. The resulting analysis uses the theory of resistance trees for regular perturbed Markov decision processes, thereby allowing a relaxation of the aforementioned structural assumptions.

show abstract

Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria

Shamma

Arslan

2005

IEEE Trans. Automat. Contr.

336

294

View full text Add to dashboard Cite

We consider a continuous-time form of repeated matrix games in which player strategies evolve in reaction to opponent actions. Players observe each other's actions, but do not have access to other player utilities. Strategy evolution may be of the best response sort, as in fictitious play, or a gradient update. Such mechanisms are known to not necessarily converge. We introduce a form of "dynamic" fictitious and gradient play strategy update mechanisms. These mechanisms use derivative action in processing opponent actions and, in some cases, can lead to behavior converging to Nash equilibria in previously nonconvergent situations. We analyze convergence in the case of exact and approximate derivative measurements of the dynamic update mechanisms. In the ideal case of exact derivative measurements, we show that convergence to Nash equilibrium can always be achieved. In the case of approximate derivative measurements, we derive a characterization of local convergence that shows how the dynamic update mechanisms can converge if the traditional static counterparts do not. We primarily discuss two player games, but also outline extensions to multiplayer games. We illustrate these methods with convergent simulations of the well known Shapley and Jordan counterexamples.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jeff S. Shamma

Research on gain scheduling

Consensus Filters for Sensor Networks and Distributed Sensor Fusion

Cooperative Control and Potential Games

Revisiting log-linear learning: Asynchrony, completeness and payoff-based implementation

Dynamic fictitious play, dynamic gradient play, and distributed convergence to Nash equilibria

Contact Info

Product

Resources

About