A Framework for Transforming Specifications in Reinforcement Learning

Alur, Rajeev; Bansal, Suguman; Bastani, Osbert; Jothimurugan, Kishor

doi:10.48550/arxiv.2111.00272

Cited by 3 publications

(5 citation statements)

References 16 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There has been recent work on using specifications based on temporal logic for specifying RL tasks in the single agent setting; a comprehensive survey may be found in [2]. There has also been recent work on using temporal logic specifications for multi-agent RL [10,22], but these approaches focus on cooperative scenarios in which there is a common objective that all agents are trying to achieve.…”

Section: Rl From High-level Specificationsmentioning

confidence: 99%

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Jothimurugan

Bansal

Bastani

et al. 2022

Computer Aided Verification

Self Cite

View full text Add to dashboard Cite

Reinforcement learning has been shown to be an effective strategy for automatically training policies for challenging control problems. Focusing on non-cooperative multi-agent systems, we propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium. In our approach, rather than providing low-level reward functions, the user provides high-level specifications that encode the objective of each agent. Then, guided by the structure of the specifications, our algorithm searches over policies to identify one that provably forms an$$\epsilon $$ϵ-Nash equilibrium (with high probability). Importantly, it prioritizes policies in a way that maximizes social welfare across all agents. Our empirical evaluation demonstrates that our algorithm computes equilibrium policies with high social welfare, whereas state-of-the-art baselines either fail to compute Nash equilibria or compute ones with comparatively lower social welfare.

show abstract

Section: Rl From High-level Specificationsmentioning

confidence: 99%

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Jothimurugan

Bansal

Bastani

et al. 2022

Computer Aided Verification

Self Cite

View full text Add to dashboard Cite

show abstract

“…where ψ sas (n) ≡ 2 P (s, a, s )(1 − P (s, a, s )))ξ(n) + 7 3 ξ(n), ψ(n) ≡ 1 2 ξ(n) + 7 3 ξ(n), and ξ(n) ≡ log( 4n 2 |S| 2 |A| δ )/(n − 1). 3 Our assumptions are consistent with the minimal requirements studied by [39] Remark 4.1.…”

Section: Additional Assumptions and Definitionsmentioning

confidence: 99%

“…These assumptions have complex interactions with the environment, making them impractical if not impossible to calculate. The situation is made more complex by recent theoretical results [61,7] that show that there exist LTL tasks that are not PAC-MDP-learnable.…”

Section: Introductionmentioning

confidence: 99%

Policy Optimization with Linear Temporal Logic Constraints

Voloshin¹,

Le²,

Chaudhuri³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study the problem of policy optimization (PO) with linear temporal logic (LTL) constraints. The language of LTL allows flexible description of tasks that may be unnatural to encode as a scalar cost function. We consider LTL-constrained PO as a systematic framework, decoupling task specification from policy selection, and an alternative to the standard of cost shaping. With access to a generative model, we develop a model-based approach that enjoys a sample complexity analysis for guaranteeing both task satisfaction and cost optimality (through a reduction to a reachability problem). Empirically, our algorithm can achieve strong performance even in low sample regimes.Preprint. Under review.

show abstract

“…RL from high-level specifications. There has been recent work on using specifications based on temporal logic for specifying RL tasks in the single agent setting; a comprehensive survey may be found in [2]. There has also been recent work on using temporal logic specifications for multi-agent RL [10,24], but these approaches focus on cooperative scenarios in which there is a common objective that all agents are trying to achieve.…”

Section: Related Workmentioning

confidence: 99%

“…Here, achieve and ensuring correspond to the "eventually" and "always" operators in temporal logic 2. We do not require that the subgoal regions partition the state space or that they be non-overlapping.…”

mentioning

confidence: 99%

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Jothimurugan¹,

Bansal²,

Bastani³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Reinforcement learning has been shown to be an effective strategy for automatically training policies for challenging control problems. Focusing on non-cooperative multi-agent systems, we propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium. In our approach, rather than providing lowlevel reward functions, the user provides high-level specifications that encode the objective of each agent. Then, guided by the structure of the specifications, our algorithm searches over policies to identify one that provably forms an -Nash equilibrium (with high probability). Importantly, it prioritizes policies in a way that maximizes social welfare across all agents. Our empirical evaluation demonstrates that our algorithm computes equilibrium policies with high social welfare, whereas state-of-the-art baselines either fail to compute Nash equilibria or compute ones with comparatively lower social welfare.

show abstract

A Framework for Transforming Specifications in Reinforcement Learning

Cited by 3 publications

References 16 publications

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Policy Optimization with Linear Temporal Logic Constraints

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Contact Info

Product

Resources

About