Reinforcement learning with temporal logic rewards

Li, Xiao; Vasile, Cristian-Ioan; Belta, Călin

doi:10.1109/iros.2017.8206234

Cited by 126 publications

(119 citation statements)

References 17 publications

(30 reference statements)

Supporting

Mentioning

115

Contrasting

Order By: Relevance

“…These definitions will be used in Section IV. As it will be discussed in Appendix A, the transition from q B to q B is enabled if a Boolean formula denoted by b q B ,q B (see (15)) is satisfied. Given b q B ,q B we define the set Σ q B ,q B that collects all feasible symbols σ q B ,q B that satisfy b q B ,q B , i.e., σ q B ,q B |= b q B ,q B .…”

Section: B Distance Metric Over the Nbamentioning

confidence: 99%

See 1 more Smart Citation

Reactive Temporal Logic Planning for Multiple Robots in Unknown Environments

Kantaros¹,

Malencia²,

Kumar³

et al. 2020

2020 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

This paper proposes a new reactive temporal logic planning algorithm for multiple robots that operate in environments with unknown geometry modeled using occupancy grid maps. The robots are equipped with individual sensors that allow them to continuously learn a grid map of the unknown environment using existing Simultaneous Localization and Mapping (SLAM) methods. The goal of the robots is to accomplish complex collaborative tasks, captured by global Linear Temporal Logic (LTL) formulas. The majority of existing LTL planning approaches rely on discrete abstractions of the robot dynamics operating in known environments and, as a result, they cannot be applied to the more realistic scenarios where the environment is initially unknown. In this paper, we address this novel challenge by proposing the first reactive, abstraction-free, and distributed LTL planning algorithm that can be applied for complex mission planning of multiple robots operating in unknown environments. The proposed algorithm is reactive i.e., planning is adapting to the updated environmental map and abstraction-free as it does not rely on designing abstractions of the robot dynamics. Also, our algorithm is distributed in the sense that the global LTL task is decomposed into single-agent reachability problems constructed online based on the continuously learned map. The proposed algorithm is complete under mild assumptions on the structure of the environment and the sensor models. We provide extensive numerical simulations and hardware experiments that illustrate the theoretical analysis and show that the proposed algorithm can address complex planning tasks for large-scale multi-robot systems in unknown environments.

show abstract

Section: B Distance Metric Over the Nbamentioning

confidence: 99%

“…, we have that once the robots reach q next B they will be able to stay in this state as long as they keep generating this symbol; see (15) in Appendix A. With slight abuse of notation, we denote the selected symbol by σ next [line 3, Alg.…”

Section: A Distributed Construction Of Robot Pathsmentioning

confidence: 99%

Reactive Temporal Logic Planning for Multiple Robots in Unknown Environments

Kantaros¹,

Malencia²,

Kumar³

et al. 2020

2020 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

show abstract

“…In this section, we provide definitions for TLTL (refer to our previous work [17] for a more elaborate discussion of TLTL). A TLTL formula is defined over predicates of form f (s) < c, where f : IR n → IR is a function of state and c is a constant.…”

Section: Preliminaries a Truncated Linear Temporal Logic (Tltl)mentioning

confidence: 99%

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Xiao

Belta³

2018

2018 Annual American Control Conference (ACC)

Self Cite

View full text Add to dashboard Cite

Reward engineering is an important aspect of reinforcement learning. Whether or not the users' intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often requires parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach.

show abstract

“…In this paper we concentrate on model-free approaches and infinitary behaviors for finite MDPs. Related problems include model-based RL [9], RL for finite-horizon objectives [14], and learning for efficient verification [3]. This paper is organized as follows.…”

Section: Introductionmentioning

confidence: 99%

Omega-Regular Objectives in Model-Free Reinforcement Learning

Hahn

Perez

Schewe

et al. 2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almostsure reachability problem, and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of ω-regular properties into limitdeterministic Büchi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.An ω-word w on an alphabet Σ is a function w : N → Σ. We abbreviate w(i) by w i . The set of ω-words on Σ is written Σ ω and a subset of Σ ω is an ω-language on Σ.A probability distribution over a finite set S is a function d : S→[0, 1] such that s∈S d(s) = 1. Let D(S) denote the set of all discrete distributions over S. We say a distribution d ∈ D(S) is a point distribution if d(s)=1 for some s ∈ S. For a distribution d ∈ D(S) we write supp(d) def = {s ∈ S : d(s) > 0}.

show abstract

Reinforcement learning with temporal logic rewards

Cited by 126 publications

References 17 publications

Reactive Temporal Logic Planning for Multiple Robots in Unknown Environments

Reactive Temporal Logic Planning for Multiple Robots in Unknown Environments

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Omega-Regular Objectives in Model-Free Reinforcement Learning

Contact Info

Product

Resources

About