2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017
DOI: 10.1109/iros.2017.8206234
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning with temporal logic rewards

Abstract: Abstract-Reinforcement learning (RL) depends critically on the choice of reward functions used to capture the desired behavior and constraints of a robot. Usually, these are handcrafted by a expert designer and represent heuristics for relatively simple tasks. Real world applications typically involve more complex tasks with rich temporal and logical structure. In this paper we take advantage of the expressive power of temporal logic (TL) to specify complex rules the robot should follow, and incorporate domain… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
115
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 126 publications
(119 citation statements)
references
References 17 publications
(30 reference statements)
0
115
0
Order By: Relevance
“…These definitions will be used in Section IV. As it will be discussed in Appendix A, the transition from q B to q B is enabled if a Boolean formula denoted by b q B ,q B (see (15)) is satisfied. Given b q B ,q B we define the set Σ q B ,q B that collects all feasible symbols σ q B ,q B that satisfy b q B ,q B , i.e., σ q B ,q B |= b q B ,q B .…”
Section: B Distance Metric Over the Nbamentioning
confidence: 99%
See 1 more Smart Citation
“…These definitions will be used in Section IV. As it will be discussed in Appendix A, the transition from q B to q B is enabled if a Boolean formula denoted by b q B ,q B (see (15)) is satisfied. Given b q B ,q B we define the set Σ q B ,q B that collects all feasible symbols σ q B ,q B that satisfy b q B ,q B , i.e., σ q B ,q B |= b q B ,q B .…”
Section: B Distance Metric Over the Nbamentioning
confidence: 99%
“…, we have that once the robots reach q next B they will be able to stay in this state as long as they keep generating this symbol; see (15) in Appendix A. With slight abuse of notation, we denote the selected symbol by σ next [line 3, Alg.…”
Section: A Distributed Construction Of Robot Pathsmentioning
confidence: 99%
“…In this section, we provide definitions for TLTL (refer to our previous work [17] for a more elaborate discussion of TLTL). A TLTL formula is defined over predicates of form f (s) < c, where f : IR n → IR is a function of state and c is a constant.…”
Section: Preliminaries a Truncated Linear Temporal Logic (Tltl)mentioning
confidence: 99%
“…In this paper we concentrate on model-free approaches and infinitary behaviors for finite MDPs. Related problems include model-based RL [9], RL for finite-horizon objectives [14], and learning for efficient verification [3]. This paper is organized as follows.…”
Section: Introductionmentioning
confidence: 99%