Learning with Opponent-Learning Awareness

Foerster, Jakob; Chen, Richard Y.; Al-Shedivat, Maruan; Whiteson, Shimon; Abbeel, Pieter; Mordatch, Igor

doi:10.48550/arxiv.1709.04326

Cited by 27 publications

(33 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ref. [625] shows that naive and commonly defecting reinforcement learners start to cooperate when they incorporate in their own learning process the awareness of their opponent's learning. Appropriately dubbed learning with opponent-learning awareness or LOLA, the approach leads to the emergence of tit-for-tat and consequent cooperation in the iterated prisoners' dilemma.…”

Section: Ai Agents For Promoting Cooperationmentioning

confidence: 99%

“…One of the best-performing strategies in terms of the overall average score is the Desired Belief Strategy [632], which actively analyses the opponent and responds depending on whether the opponent's action is perceived as noise or a genuine behavioural change. Ultimately, an inescapable conclusions is that reinforcement learning is an effective means to construct strong strategies for various iterated social-dilemma situations [625,631,633,634].…”

Section: Ai Agents For Promoting Cooperationmentioning

confidence: 99%

See 1 more Smart Citation

Social physics

Jusup,

Holme,

Kanazawa

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent decades have seen a rise in the use of physics-inspired or physicslike methods in attempts to resolve diverse societal problems. Such a rise is driven both by physicists venturing outside of their traditional domain of interest, but also by scientists from other domains who wish to mimic the enormous success of physics throughout the 19 th and 20 th century. Here, we dub the physics-inspired and physics-like work on societal problems "social physics", and pay our respect to intellectual mavericks who nurtured the field to its maturity. We do so by comprehensively (but not exhaustively) reviewing the current state of the art. Starting with a set of topics that pertain to the modern way of living and factors that enable humankind's prosperous existence, we discuss urban development and traffic, the functioning of financial markets, cooperation as a basis for civilised life, the structure of (social) networks, and the integration of intelligent machines in such networks. We then shift focus to a set of topics that explore potential threats to humanity. These include criminal behaviour, massive migrations, contagions, environmental problems, and finally climate change. The coverage of each topic is ended with ideas for future progress. Based on the number of ideas laid out, but also on the fact that the field is already too big for an exhaustive review despite our best efforts, we are forced to conclude that the future for social physics is bright. Physicists tackling societal problems are no longer a curiosity, but rather a force to be reckoned with, yet for reckoning to be truly productive, it is necessary to build dialog and mutual understanding with social scientists, environmental scientists, philosophers, and more.

show abstract

Section: Ai Agents For Promoting Cooperationmentioning

confidence: 99%

Section: Ai Agents For Promoting Cooperationmentioning

confidence: 99%

Social physics

Jusup,

Holme,

Kanazawa

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…For example, Lockhart et al [2019] performs direct policy optimization against worst-case opponents and effectively finds an NE in Kuhn Poker and Goofspiel card game. Foerster et al [2017] invented LOLA where each agent shapes learning of other agents. It gave the highest average returns on the iterated prisoners' dilemma (IPD).…”

Section: Introductionmentioning

confidence: 99%

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Zhao¹,

Tian²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

Policy gradient methods are widely used in solving two-player zero-sum games to achieve superhuman performance in practice. However, it remains elusive when they can provably find a near-optimal solution and how many samples and iterations are needed. The current paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games where function approximation is used for generalization across states. We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error. To our knowledge, this is the first quantitative analysis of policy gradient methods with function approximation for two-player zero-sum Markov games.

show abstract

“…In other words, we perform our experiments in scenarios with a similar nature to the one depicted in Figure 1 that essentially require all agents to work together and success cannot be achieved by any of them individually. Our main contributions are as follows: [8]. Additionally, the idea of decorrelating training samples by drawing them from an experience replay buffer becomes obsolete and a multi-agent derivation of importance sampling can be employed to remove the outdated samples from the replay buffer [9].…”

Section: Introductionmentioning

confidence: 99%

Cooperative Autonomous Vehicles that Sympathize with Human Drivers

Toghi¹,

Valiente²,

Sadigh³

et al. 2021

Preprint

View full text Add to dashboard Cite

Widespread adoption of autonomous vehicles will not become a reality until solutions are developed that enable these intelligent agents to co-exist with humans. This includes safely and efficiently interacting with human-driven vehicles, especially in both conflictive and competitive scenarios. We build up on the prior work on socially-aware navigation and borrow the concept of social value orientation from psychology -that formalizes how much importance a person allocates to the welfare of others-in order to induce altruistic behavior in autonomous driving. In contrast with existing works that explicitly model the behavior of human drivers and rely on their expected response to create opportunities for cooperation, our Sympathetic Cooperative Driving (SymCoDrive) paradigm trains altruistic agents that realize safe and smooth traffic flow in competitive driving scenarios only from experiential learning and without any explicit coordination. We demonstrate a significant improvement in both safety and traffic-level metrics as a result of this altruistic behavior and importantly conclude that the level of altruism in agents requires proper tuning as agents that are too altruistic also lead to sub-optimal traffic flow. The code and supplementary material are available at: https://symcodrive.toghi.net/

show abstract

Learning with Opponent-Learning Awareness

Cited by 27 publications

References 19 publications

Social physics

Social physics

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Cooperative Autonomous Vehicles that Sympathize with Human Drivers

Contact Info

Product

Resources

About