2020
DOI: 10.1609/aaai.v34i02.5578
|View full text |Cite
|
Sign up to set email alerts
|

Double-Oracle Sampling Method for Stackelberg Equilibrium Approximation in General-Sum Extensive-Form Games

Abstract: The paper presents a new method for approximating Strong Stackelberg Equilibrium in general-sum sequential games with imperfect information and perfect recall. The proposed approach is generic as it does not rely on any specific properties of a particular game model. The method is based on iterative interleaving of the two following phases: (1) guided Monte Carlo Tree Search sampling of the Follower's strategy space and (2) building the Leader's behavior strategy tree for which the sampled Follower's strategy … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
2

Relationship

3
4

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 23 publications
0
9
0
Order By: Relevance
“…The first approach (referred to as O2UCT -double-oracle UCT sampling) [11,13] relies on a guided sampling of the follower's strategy space interleaved with finding a feasible leader's strategy using double-oracle method.…”
Section: A Summary Of O2uct Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…The first approach (referred to as O2UCT -double-oracle UCT sampling) [11,13] relies on a guided sampling of the follower's strategy space interleaved with finding a feasible leader's strategy using double-oracle method.…”
Section: A Summary Of O2uct Methodsmentioning
confidence: 99%
“…δ l must satisfy the following conditions: (1) π r f is the best response strategy against δ l ; (2) δ l provides as high as possible leader's utility when played against the best follower's response. An algorithm of finding the requested leader's strategy δ l is outlined below and detailed in [13].…”
Section: A Summary Of O2uct Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…CBK2018 [17]), others employ different ideas. O2UCT [18,19], For instance, utilizes Upper Confidence Bounds applied to trees [20] (a variant of Monte Carlo Tree Search) and combines sampling the follower's strategy space with calculating the best leader's strategy for which a sampled followers strategy is the optimal response. Another heuristic method, EASG [21] maintains a population of candidate leader's strategies and applies specifically designed mutation and crossover operators.…”
Section: State-of-the-art Approachesmentioning
confidence: 99%