Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022
DOI: 10.24963/ijcai.2022/484
|View full text |Cite
|
Sign up to set email alerts
|

Approximate Exploitability: Learning a Best Response

Abstract: Recent learning-based image fusion methods have marked numerous progress in pre-registered multi-modality data, but suffered serious ghosts dealing with misaligned multi-modality data, due to the spatial deformation and the difficulty narrowing cross-modality discrepancy. To overcome the obstacles, in this paper, we present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion (IVIF). Specifically, we propose a Cross-modality Perceptual Style T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 1 publication
0
6
0
Order By: Relevance
“…Unlike extensive-form fictitious play [Heinrich et al, 2015] and counterfactual regret minimization [Zinkevich et al, 2007], their convergence result pertains to the strategies being optimized rather than the time-average strategies. Timbers et al [2022] introduced approximate exploitability, which uses approximate best responses computed through a combination of search and reinforcement learning. It generalizes a domain-specific technique for poker called local best response [Lisý and Bowling, 2017].…”
Section: A Further Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Unlike extensive-form fictitious play [Heinrich et al, 2015] and counterfactual regret minimization [Zinkevich et al, 2007], their convergence result pertains to the strategies being optimized rather than the time-average strategies. Timbers et al [2022] introduced approximate exploitability, which uses approximate best responses computed through a combination of search and reinforcement learning. It generalizes a domain-specific technique for poker called local best response [Lisý and Bowling, 2017].…”
Section: A Further Related Workmentioning
confidence: 99%
“…ψ is non-negative and zero precisely at Nash equilibria. It is also known as the NashConv in the literature [Lanctot et al, 2017a;Lockhart et al, 2019;Walton and Lisy, 2021;Timbers et al, 2022], and is the standard measure of closeness to Nash equilibrium. Our goal is to find strategy profiles with low exploitability.…”
Section: Introductionmentioning
confidence: 99%
“…Convergence metric for T-FP. To estimate the convergence of the sequence of strategy pairs generated by T-FP, we use the approximate exploitability metric δ [102]:…”
Section: A Learning Equilibrium Strategies Through Self-playmentioning
confidence: 99%
“…One simple example is chess, where rule-based AI surpassed humans in 1997 (Campbell, Hoane Jr, and Hsu 2002), eventually followed by RL-based AI methods (Silver et al 2018). Nobody would claim that superhuman chess or Go algorithms can do anything other than play the given games, and even simple tweaks to the game rules, and unusual or adversarial strategies can throw the algorithms off (Lan et al 2022;Timbers et al 2020;Wang et al 2022). Even deep RL algorithms that can master multiple Atari games (Mnih et al 2013(Mnih et al , 2015 are still ultimately constrained to a certain subset of game types.…”
Section: Why Embodiment Is Key For Agimentioning
confidence: 99%