Specification-Guided Learning of Nash Equilibria with High Social Welfare

Jothimurugan, Kishor; Bansal, Suguman; Bastani, Osbert; Alur, Rajeev

doi:10.1007/978-3-031-13188-2_17

Cited by 14 publications

(17 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although we only demonstrated three examples, our theorem also applies to other objectives in the literature. Some examples are (1) modifications to the simple reward machine such as the (standard) reward machine (Camacho et al 2019) (where rewards depend on not only the reward machine's state but also the environment's state) and the stochastic reward machine (Corazza, Gavran, and Neider 2022), (2) other LTL-in-the-limit objectives (Sadigh et al 2014;, and (3) various finite-horizon objectives (Henriques et al 2012;Jothimurugan, Alur, and Bastani 2019;Giacomo et al 2019).…”

Section: Discussionmentioning

confidence: 99%

“…Previous work (Alur et al 2021) gave a framework of reductions between objectives whose flavor of generality is most similar to our work; however, they did not give a condition for when an objective is PAC-learnable. To our knowledge, the PAC-learnability of the objectives in Sadigh et al (2014); Littman et al (2017); ; ; Camacho et al (2019); Jothimurugan, Alur, and Bastani (2019); are not known.…”

Section: Introductionmentioning

confidence: 99%

“…General Objectives. In recent years, researchers have introduced various objectives beyond the two classic rewards objectives (Henriques et al 2012;Sadigh et al 2014;Littman et al 2017;Camacho et al 2019;Giacomo et al 2019;Jothimurugan, Alur, and Bastani 2019;Ronca and De Giacomo 2021). For example:…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Yang¹,

Littman²,

Carbin³

2023

Preprint

View full text Add to dashboard Cite

In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable: There are algorithms that learn a near-optimal policy with high probability using a finite amount of samples and computation. In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards, such as objectives specified as linear temporal logic formulas. However, questions about the PAC-learnability of these new objectives have remained open. This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable. In other words, if a procedure computes successive approximations of the objective's value, then the objective is PAC-learnable. We give three applications of our condition on objectives from the literature with previously unknown PAC-learnability and prove that these objectives are PAC-learnable. Overall, our result helps verify existing objectives' PAC-learnability. Also, as some studied objectives that are not uniformly continuous have been shown to be not PAC-learnable, our results could guide the design of new PAC-learnable objectives.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Yang¹,

Littman²,

Carbin³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Finally, while we select LTL as the specification language for this paper, the approach can be adapted to generate any formal language capable of encoding the mission-relevant task specifications. Indeed, the development of a formal specification language for robotics is an active research area [32,14].…”

Section: Related Workmentioning

confidence: 99%

Lang2LTL: Translating Natural Language Commands to Temporal Robot Task Specification

Liu¹,

Yang²,

Idrees³

et al. 2023

Preprint

View full text Add to dashboard Cite

Natural language provides a powerful modality to program robots to perform temporal tasks. Linear temporal logic (LTL) provides unambiguous semantics for formal descriptions of temporal tasks. However, existing approaches cannot accurately and robustly translate English sentences to their equivalent LTL formulas in unseen environments. To address this problem, we propose Lang2LTL, a novel modular system that leverages pretrained large language models to first extract referring expressions from a natural language command, then ground the expressions to real-world landmarks and objects, and finally translate the command into an LTL task specification for the robot. It enables any robotic system to interpret natural language navigation commands without additional training, provided that it tracks its position and has a semantic map with landmarks labeled with free-form text. We demonstrate the stateof-the-art ability to generalize to multi-scale navigation domains such as OpenStreetMap (OSM) and CleanUp World (a simulated household environment). Lang2LTL achieves an average accuracy of 88.4% in translating challenging LTL formulas in 22 unseen OSM environments as evaluated on a new corpus of over 10,000 commands, 22 times better than the previous SoTA. Without modification, the best performing Lang2LTL model on the OSM dataset can translate commands in CleanUp World with 82.8% accuracy. As a part of our proposed comprehensive evaluation procedures, we collected a new labeled dataset of English commands representing 2, 125 unique LTL formulas, the largest ever dataset of natural language commands to LTL specifications for robotic tasks with the most diverse LTL formulas, 40 times more than previous largest dataset. Finally, we integrated Lang2LTL with a planner to command a quadruped mobile robot to perform multi-step navigational tasks in an analog real-world environment recreated in the lab.

show abstract

“…In particular, we consider counterfactual conditionals that relate two properties expressed in temporal logics, such as the temporal property ¬ F e from the introductory example. Temporal logics are used ubiquitously as high-level specifications for verification [21,4] and synthesis [22,41], and recently have also found use in specifying reinforcement learning tasks [32,39]. Our work lifts the language of counterfactual reasoning to similar high-level expressions.…”

Section: Introductionmentioning

confidence: 99%

Counterfactuals Modulo Temporal Logics

Finkbeiner¹,

Siber²

EPiC Series in Computing

View full text Add to dashboard Cite

Lewis’ theory of counterfactuals is the foundation of many contemporary notions of causality. In this paper, we extend this theory in the temporal direction to enable symbolic counterfactual reasoning on infinite sequences, such as counterexamples found by a model checker and trajectories produced by a reinforcement learning agent. In particular, our extension considers a more relaxed notion of similarity between worlds and proposes two additional counterfactual operators that close a semantic gap between the previous two in this more general setting. Further, we consider versions of counterfactuals that minimize the distance to the witnessing counterfactual worlds, a common requirement in causal analysis. To automate counterfactual reasoning in the temporal domain, we introduce a logic that combines temporal and counterfactual operators, and outline decision procedures for the satisfiability and trace-checking problems of this logic.

show abstract

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Cited by 14 publications

References 25 publications

Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Lang2LTL: Translating Natural Language Commands to Temporal Robot Task Specification

Counterfactuals Modulo Temporal Logics

Contact Info

Product

Resources

About