2022
DOI: 10.1007/978-3-031-13188-2_17
|View full text |Cite
|
Sign up to set email alerts
|

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Abstract: Reinforcement learning has been shown to be an effective strategy for automatically training policies for challenging control problems. Focusing on non-cooperative multi-agent systems, we propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium. In our approach, rather than providing low-level reward functions, the user provides high-level specifications that encode the objective of each agent. Then, guided by the structure of the specifications, our algorithm s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(17 citation statements)
references
References 25 publications
0
17
0
Order By: Relevance
“…Although we only demonstrated three examples, our theorem also applies to other objectives in the literature. Some examples are (1) modifications to the simple reward machine such as the (standard) reward machine (Camacho et al 2019) (where rewards depend on not only the reward machine's state but also the environment's state) and the stochastic reward machine (Corazza, Gavran, and Neider 2022), (2) other LTL-in-the-limit objectives (Sadigh et al 2014;, and (3) various finite-horizon objectives (Henriques et al 2012;Jothimurugan, Alur, and Bastani 2019;Giacomo et al 2019).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Although we only demonstrated three examples, our theorem also applies to other objectives in the literature. Some examples are (1) modifications to the simple reward machine such as the (standard) reward machine (Camacho et al 2019) (where rewards depend on not only the reward machine's state but also the environment's state) and the stochastic reward machine (Corazza, Gavran, and Neider 2022), (2) other LTL-in-the-limit objectives (Sadigh et al 2014;, and (3) various finite-horizon objectives (Henriques et al 2012;Jothimurugan, Alur, and Bastani 2019;Giacomo et al 2019).…”
Section: Discussionmentioning
confidence: 99%
“…Previous work (Alur et al 2021) gave a framework of reductions between objectives whose flavor of generality is most similar to our work; however, they did not give a condition for when an objective is PAC-learnable. To our knowledge, the PAC-learnability of the objectives in Sadigh et al (2014); Littman et al (2017); ; ; Camacho et al (2019); Jothimurugan, Alur, and Bastani (2019); are not known.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, while we select LTL as the specification language for this paper, the approach can be adapted to generate any formal language capable of encoding the mission-relevant task specifications. Indeed, the development of a formal specification language for robotics is an active research area [32,14].…”
Section: Related Workmentioning
confidence: 99%
“…In particular, we consider counterfactual conditionals that relate two properties expressed in temporal logics, such as the temporal property ¬ F e from the introductory example. Temporal logics are used ubiquitously as high-level specifications for verification [21,4] and synthesis [22,41], and recently have also found use in specifying reinforcement learning tasks [32,39]. Our work lifts the language of counterfactual reasoning to similar high-level expressions.…”
Section: Introductionmentioning
confidence: 99%