2021
DOI: 10.48550/arxiv.2111.00272
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Framework for Transforming Specifications in Reinforcement Learning

Abstract: Reactive synthesis algorithms allow automatic construction of policies to control an environment modeled as a Markov Decision Process (MDP) that are optimal with respect to high-level temporal logic specifications assuming the MDP model is known a priori. Reinforcement learning algorithms, in contrast, are designed to learn an optimal policy when the transition probabilities of the MDP are unknown, but require the user to associate local rewards with transitions. The appeal of highlevel temporal logic specific… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 16 publications
(27 reference statements)
0
5
0
Order By: Relevance
“…There has been recent work on using specifications based on temporal logic for specifying RL tasks in the single agent setting; a comprehensive survey may be found in [2]. There has also been recent work on using temporal logic specifications for multi-agent RL [10,22], but these approaches focus on cooperative scenarios in which there is a common objective that all agents are trying to achieve.…”
Section: Rl From High-level Specificationsmentioning
confidence: 99%
“…There has been recent work on using specifications based on temporal logic for specifying RL tasks in the single agent setting; a comprehensive survey may be found in [2]. There has also been recent work on using temporal logic specifications for multi-agent RL [10,22], but these approaches focus on cooperative scenarios in which there is a common objective that all agents are trying to achieve.…”
Section: Rl From High-level Specificationsmentioning
confidence: 99%
“…where ψ sas (n) ≡ 2 P (s, a, s )(1 − P (s, a, s )))ξ(n) + 7 3 ξ(n), ψ(n) ≡ 1 2 ξ(n) + 7 3 ξ(n), and ξ(n) ≡ log( 4n 2 |S| 2 |A| δ )/(n − 1). 3 Our assumptions are consistent with the minimal requirements studied by [39] Remark 4.1.…”
Section: Additional Assumptions and Definitionsmentioning
confidence: 99%
“…These assumptions have complex interactions with the environment, making them impractical if not impossible to calculate. The situation is made more complex by recent theoretical results [61,7] that show that there exist LTL tasks that are not PAC-MDP-learnable.…”
Section: Introductionmentioning
confidence: 99%
“…RL from high-level specifications. There has been recent work on using specifications based on temporal logic for specifying RL tasks in the single agent setting; a comprehensive survey may be found in [2]. There has also been recent work on using temporal logic specifications for multi-agent RL [10,24], but these approaches focus on cooperative scenarios in which there is a common objective that all agents are trying to achieve.…”
Section: Related Workmentioning
confidence: 99%
“…Here, achieve and ensuring correspond to the "eventually" and "always" operators in temporal logic 2. We do not require that the subgoal regions partition the state space or that they be non-overlapping.…”
mentioning
confidence: 99%