LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

Rivas, Alberto; Icarte, Rodrigo Toro; Klassen, Toryn Q.; Valenzano, Richard; McIlraith, Sheila A.

doi:10.24963/ijcai.2019/840

Cited by 123 publications

(110 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Solving MDPs with non-Markovian rewards [Bacchus et al, 1996;Thiébaux et al, 2006;Brafman et al, 2018] with PLTL f /PLDL f rewards is EXPTIME-complete in the domain and EXPTIME in PLTL f /PLDL f rewards, while the latter is 2EXPTIME-complete for LTL f /LDL f rewards [Brafman et al, 2018]. Reinforcement Learning where rewards are based on traces [De Giacomo et al, 2019;Camacho et al, 2019] with PLTL f /PLDL f rewards also gain the exponential improvement. Planning in non-Markovian domains [Brafman and De Giacomo, 2019a], with both the non-Markovian domain and the goal expressed in PLTL f /PLDL f is EXPTIME-complete in the domain and in the goal, vs. 2EXPTIME-complete in the domain and in the goal in the case these are expressed in LTL f /LDL f .…”

Section: Reverse Languages and Afamentioning

confidence: 99%

“…This exponential improvement affects the computational complexity of problems involving temporal logics on finite traces in several contexts, including planning in nondeterministic domains (FOND) [Camacho et al, 2017;De Giacomo and Rubin, 2018], reactive synthesis [De Giacomo and Vardi, 2015;Camacho et al, 2018], MDPs with non-Markovian rewards [Bacchus et al, 1996;Brafman et al, 2018], reinforcement learning [De Giacomo et al, 2019;Camacho et al, 2019], and non-Markovian planning and decision problems [Brafman and De Giacomo, 2019a;Brafman and De Giacomo, 2019b].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Pure-Past Linear Temporal and Dynamic Logic on Finite Traces

Giacomo

Stasio

Fuggitti

et al. 2020

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

222

475

View full text Add to dashboard Cite

We review PLTLf and PLDLf, the pure-past versions of the well-known logics on finite traces LTLf and LDLf, respectively. PLTLf and PLDLf are logics about the past, and so scan the trace backwards from the end towards the beginning. Because of this, we can exploit a foundational result on reverse languages to get an exponential improvement, over LTLf /LDLf , for computing the corresponding DFA. This exponential improvement is reflected in several forms of sequential decision making involving temporal specifications, such as planning and decision problems in non-deterministic and non-Markovian domains. Interestingly, PLTLf (resp., PLDLf ) has the same expressive power as LTLf (resp., LDLf ), but transforming a PLTLf (resp., PLDLf ) formula into its equivalent LTLf (resp.,LDLf) is quite expensive. Hence, to take advantage of the exponential improvement, properties of interest must be directly expressed in PLTLf /PLDLf .

show abstract

Section: Reverse Languages and Afamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Pure-Past Linear Temporal and Dynamic Logic on Finite Traces

Giacomo

Stasio

Fuggitti

et al. 2020

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

222

475

View full text Add to dashboard Cite

show abstract

“…where γ is the MDP's discount factor and Φ : S → R is a real-valued function. The automata structure can be exploited by defining F : (U \ {u A , u R }) × U → R in terms of the automaton states instead of the MDP states (Camacho et al, 2019;Furelos-Blanco et al, 2020):…”

Section: Option Modeling Given a Subgoal Automatonmentioning

confidence: 99%

“…Automaton structures have also been exploited in reward machines to give bonus reward signals. Camacho et al (2019) convert reward functions expressed in various formal languages (e.g., linear temporal logic) into RMs, and propose a reward shaping method that runs value iteration on the RM states. Similarly, Camacho et al (2017) use automata as representations of non-markovian rewards and exploit their structure to guide the search of an MDP planner using reward shaping.…”

Section: Automata In Reinforcement Learningmentioning

confidence: 99%

Induction of Subgoal Automata for Reinforcement Learning

Furelos-Blanco

Law

Russo

et al. 2020

AAAI

View full text Add to dashboard Cite

In this work we present ISA, a novel approach for learning and exploiting subgoals in reinforcement learning (RL). Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events have on the learner's performance.

show abstract

“…In order to compute the policy, the PUnS instance is first compiled into a reward machine ( [26]) corresponding to a Markov representation for P(ϕ), represented as a deterministic MDP,…”

Section: Planning With Uncertain Specificationsmentioning

confidence: 99%

Interactive Robot Training for Temporal Tasks

Shah

2020

Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction

View full text Add to dashboard Cite

Defining sound and complete specifications for robots using formal languages is challenging, while learning formal specifications directly from demonstrations can lead to over-constrained task policies. In this paper, we propose a Bayesian interactive robot training framework that allows the robot to learn from both demonstrations provided by a teacher, and that teacher's assessments of the robot's task executions. We also present an active learning approach -inspired by uncertainty sampling -to identify the task execution with the most uncertain degree of acceptability. We demonstrate that active learning within our framework identifies a teacher's intended task specification to a greater degree of similarity when compared with an approach that learns purely from demonstrations. Finally, we also conduct a user-study that demonstrates the efficacy of our active learning framework in learning a table-setting task from a human teacher.

show abstract

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

Cited by 123 publications

References 0 publications

Pure-Past Linear Temporal and Dynamic Logic on Finite Traces

Pure-Past Linear Temporal and Dynamic Logic on Finite Traces

Induction of Subgoal Automata for Reinforcement Learning

Interactive Robot Training for Temporal Tasks

Contact Info

Product

Resources

About