Using Response Functions to Measure Strategy Strength

Davis, Trevor; Burch, Neil; Bowling, Michael

doi:10.1609/aaai.v28i1.8830

Cited by 13 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because QSE and QNE are usually non-equivalent concepts even in zero-sum games (see Figure 1), the regretminimization algorithms will not converge to QSE. However, in case a quantal function satisfies the so-called prettygood-response condition, the algorithm converges to a strategy of the leader exploiting the follower the most (Davis, Burch, and Bowling 2014). We show that a class of simple (i.e., attaining only a finite number of values) quantal functions satisfy a pretty-good-responses condition.…”

Section: Algorithms For Computing Qsementioning

confidence: 95%

“…However, when creating AI agents competing with humans, we want to assume that one of the players is perfectly rational, and only the opponent's rationality is bounded. A tempting approach may be using the algorithms for computing QRE and increasing one player's rationality or using generic algorithms for exploiting opponents (Davis, Burch, and Bowling 2014) even though the QR model does not satisfy their assumptions, as in (Basak et al 2018). However, this approach generally leads to a solution concept we call Quantal Nash Equilibrium (QNE), which we show is very inefficient in exploiting QR opponents and may even perform worse than an arbitrary Nash equilibrium.…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, 3) we investigate the performance of CFR-based heuristics against QR opponents. The extensive empirical evaluation on four different classes of games with up to 10 8 histories identifies a variant of CFR-f (Davis, Burch, and Bowling 2014) that computes strategies better than both QNE and NE.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

Milec

Černý

Lisý

et al. 2021

AAAI

View full text Add to dashboard Cite

Solution concepts of traditional game theory assume entirely rational players; therefore, their ability to exploit subrational opponents is limited. One type of subrationality that describes human behavior well is the quantal response. While there exist algorithms for computing solutions against quantal opponents, they either do not scale or may provide strategies that are even worse than the entirely-rational Nash strategies. This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games. Our contributions are: (1) we define two different solution concepts related to exploiting quantal opponents and analyze their properties; (2) we prove that computing these solutions is computationally hard; (3) therefore, we evaluate several heuristic approximations based on scalable counterfactual regret minimization (CFR); and (4) we identify a CFR variant that exploits the bounded opponents better than the previously used variants while being less exploitable by the worst-case perfectly-rational opponent.

show abstract

Section: Algorithms For Computing Qsementioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

Milec

Černý

Lisý

et al. 2021

AAAI

View full text Add to dashboard Cite

show abstract

“…Opponent modeling is the problem of estimating the properties of an opponent (Nashed & Zilberstein, 2022). Much previous work on this topic has been done in imperfect information games like poker (Billings et al, 1998;Bard et al, 2015Bard et al, , 2013Davis et al, 2014), but this work focuses on strategic characteristics and limitations of the opponents, and the domains do not include execution uncertainty. Examples of additional areas where opponent modeling work has been explored include general multi-agent systems (Carmel & Markovitch, 1995), real-time strategy games (Schadd et al, 2007), and n-player games (Sturtevant et al, 2006).…”

Section: Opponent Modelingmentioning

confidence: 99%

Estimating Agent Skill in Continuous Action Domains

Archibald,

Nieves-Rivera

2024

jair

View full text Add to dashboard Cite

Actions in most real-world continuous domains cannot be executed exactly. An agent’s performance in these domains is influenced by two critical factors: the ability to select effective actions (decision-making skill), and how precisely it can execute those selected actions (execution skill). This article addresses the problem of estimating the execution and decision-making skill of an agent, given observations. Several execution skill estimation methods are presented, each of which utilize different information from the observations and make assumptions about the agent’s decision-making ability. A final novel method forgoes these assumptions about decision-making and instead estimates the execution and decision-making skills simultaneously under a single Bayesian framework. Experimental results in several domains evaluate the estimation accuracy of the estimators, especially focusing on how robust they are as agents and their decision-making methods are varied. These results demonstrate that reasoning about both types of skill together significantly improves the robustness and accuracy of execution skill estimation. A case study is presented using the proposed methods to estimate the skill of Major League Baseball pitchers, demonstrating how these methods can be applied to real-world data sources.

show abstract

“…. Elo as metric Davis et al (2014) showed that in many large imperfect information games the computation of a Nash equilibrium is not tractable and measuring the deviation from it is not a good measurement for the quality of an agent in all cases, e.g., they showed that a more exploitable agent is able to beat a less exploitable agent in some situations. Furthermore, they argue that calculating the exploitability can become a problem in large games.…”

Section: Training Data and Processmentioning

confidence: 99%

AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong

Blüml

Czech

Kersting

2023

Front. Artif. Intell.

View full text Add to dashboard Cite

In recent years, deep neural networks for strategy games have made significant progress. AlphaZero-like frameworks which combine Monte-Carlo tree search with reinforcement learning have been successfully applied to numerous games with perfect information. However, they have not been developed for domains where uncertainty and unknowns abound, and are therefore often considered unsuitable due to imperfect observations. Here, we challenge this view and argue that they are a viable alternative for games with imperfect information—a domain currently dominated by heuristic approaches or methods explicitly designed for hidden information, such as oracle-based techniques. To this end, we introduce a novel algorithm based solely on reinforcement learning, called AlphaZe∗∗, which is an AlphaZero-based framework for games with imperfect information. We examine its learning convergence on the games Stratego and DarkHex and show that it is a surprisingly strong baseline, while using a model-based approach: it achieves similar win rates against other Stratego bots like Pipeline Policy Space Response Oracle (P2SRO), while not winning in direct comparison against P2SRO or reaching the much stronger numbers of DeepNash. Compared to heuristics and oracle-based approaches, AlphaZe∗∗ can easily deal with rule changes, e.g., when more information than usual is given, and drastically outperforms other approaches in this respect.

show abstract

Using Response Functions to Measure Strategy Strength

Cited by 13 publications

References 7 publications

Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

Estimating Agent Skill in Continuous Action Domains

AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong

Contact Info

Product

Resources

About