Approximate Exploitability: Learning a Best Response

Wang, Di; Liu, Jinyuan; Fan, Xin; Liu, Risheng

doi:10.24963/ijcai.2022/484

Cited by 6 publications

(7 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike extensive-form fictitious play [Heinrich et al, 2015] and counterfactual regret minimization [Zinkevich et al, 2007], their convergence result pertains to the strategies being optimized rather than the time-average strategies. Timbers et al [2022] introduced approximate exploitability, which uses approximate best responses computed through a combination of search and reinforcement learning. It generalizes a domain-specific technique for poker called local best response [Lisý and Bowling, 2017].…”

Section: A Further Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Computing equilibria by minimizing exploitability with best-response ensembles

Martín¹,

Sandholm²

2023

Preprint

View full text Add to dashboard Cite

In this paper, we study the problem of computing an approximate Nash equilibrium of a continuous game. Such games naturally model many situations involving space, time, money, and other fine-grained resources or quantities. The standard measure of the closeness of a strategy profile to Nash equilibrium is exploitability, which measures how much utility players can gain from changing their strategy unilaterally. We introduce a new equilibrium-finding method that minimizes an approximation of the exploitability. This approximation employs a best-response ensemble for each player that maintains multiple candidate best responses for that player. In each iteration, the bestperforming element of each ensemble is used in a gradient-based scheme to update the current strategy profile. The strategy profile and best-response ensembles are simultaneously trained to minimize and maximize the approximate exploitability, respectively. Experiments on a suite of benchmark games show that it outperforms previous methods.

show abstract

Section: A Further Related Workmentioning

confidence: 99%

“…ψ is non-negative and zero precisely at Nash equilibria. It is also known as the NashConv in the literature [Lanctot et al, 2017a;Lockhart et al, 2019;Walton and Lisy, 2021;Timbers et al, 2022], and is the standard measure of closeness to Nash equilibrium. Our goal is to find strategy profiles with low exploitability.…”

Section: Introductionmentioning

confidence: 99%

Computing equilibria by minimizing exploitability with best-response ensembles

Martín¹,

Sandholm²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Convergence metric for T-FP. To estimate the convergence of the sequence of strategy pairs generated by T-FP, we use the approximate exploitability metric δ [102]:…”

Section: A Learning Equilibrium Strategies Through Self-playmentioning

confidence: 99%

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

Hammar¹,

Stadler²

2023

Preprint

View full text Add to dashboard Cite

We study automated intrusion response and formulate the interaction between an attacker and a defender as an optimal stopping game where attack and defense strategies evolve through reinforcement learning and self-play. The gametheoretic modeling enables us to find defender strategies that are effective against a dynamic attacker, i.e. an attacker that adapts its strategy in response to the defender strategy. Further, the optimal stopping formulation allows us to prove that optimal strategies have threshold properties. To obtain nearoptimal defender strategies, we develop Threshold Fictitious Self-Play (T-FP), a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. The experimental part of this investigation includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are collected that drive simulation runs and where learned strategies are evaluated. We argue that this approach can produce effective defender strategies for a practical IT infrastructure.

show abstract

“…One simple example is chess, where rule-based AI surpassed humans in 1997 (Campbell, Hoane Jr, and Hsu 2002), eventually followed by RL-based AI methods (Silver et al 2018). Nobody would claim that superhuman chess or Go algorithms can do anything other than play the given games, and even simple tweaks to the game rules, and unusual or adversarial strategies can throw the algorithms off (Lan et al 2022;Timbers et al 2020;Wang et al 2022). Even deep RL algorithms that can master multiple Atari games (Mnih et al 2013(Mnih et al , 2015 are still ultimately constrained to a certain subset of game types.…”

Section: Why Embodiment Is Key For Agimentioning

confidence: 99%

The Path to AGI Goes through Embodiment

Tan,

Jaiswal

2023

AAAI-SS

View full text Add to dashboard Cite

Recent advances in large language models have raised the question of whether these language models alone could lead to artificial general intelligence (AGI). In this short position essay, we argue that embodiment is not only required for achieving AGI, but also that embodiment is the key to convincingly demonstrate AGI capabilities. There is no single widely-accepted, objective test for AGI, so therefore whether a system has achieved AGI is a subjective judgement. We argue that a language-only system or one that cannot demonstrate success in the real world would not be convincing.

show abstract

Approximate Exploitability: Learning a Best Response

Cited by 6 publications

References 1 publication

Computing equilibria by minimizing exploitability with best-response ensembles

Computing equilibria by minimizing exploitability with best-response ensembles

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

The Path to AGI Goes through Embodiment

Contact Info

Product

Resources

About