53rd IEEE Conference on Decision and Control 2014
DOI: 10.1109/cdc.2014.7039527
|View full text |Cite
|
Sign up to set email alerts
|

A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications

Abstract: We propose to synthesize a control policy for a Markov decision process (MDP) such that the resulting traces of the MDP satisfy a linear temporal logic (LTL) property. We construct a product MDP that incorporates a deterministic Rabin automaton generated from the desired LTL property. The reward function of the product MDP is defined from the acceptance condition of the Rabin automaton. This construction allows us to apply techniques from learning theory to the problem of synthesis for LTL specifications even … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
145
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 113 publications
(145 citation statements)
references
References 14 publications
0
145
0
Order By: Relevance
“…The PAC MDP is generated via an RLlike algorithm, then value iteration is applied to update state values. A similar model-based solution is proposed in [18]: this also hinges on approximating the transition probabilities, which limits the precision of the policy generation process. Unlike the problem that is considered in this paper, the work in [18] is limited to policies whose traces satisfy the property with probability one.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The PAC MDP is generated via an RLlike algorithm, then value iteration is applied to update state values. A similar model-based solution is proposed in [18]: this also hinges on approximating the transition probabilities, which limits the precision of the policy generation process. Unlike the problem that is considered in this paper, the work in [18] is limited to policies whose traces satisfy the property with probability one.…”
Section: Introductionmentioning
confidence: 99%
“…A similar model-based solution is proposed in [18]: this also hinges on approximating the transition probabilities, which limits the precision of the policy generation process. Unlike the problem that is considered in this paper, the work in [18] is limited to policies whose traces satisfy the property with probability one. Moreover, [16]- [18] require to learn all transition probabilities of the MDP.…”
Section: Introductionmentioning
confidence: 99%
“…In many studies on LTL synthesis problems using RL, reward functions are formed systematically from automata corresponding to the LTL specification. This direction was first investigated by Sadigh et al [5], where they defined a reward function based on the acceptance condition of a deterministic Rabin automaton [3] that accepts all words satisfying the LTL This work was partially supported by JST-ERATO HASUO Project Grant Number JPMJER1603, Japan, and JST-Mirai Program Grant Number JP-MJMI18B4, Japan. The work of A. Sakakibara was supported by the Grantin-Aid for Japan Society for the Promotion of Science Research Fellow under Grant JP19J13487.…”
Section: Introductionmentioning
confidence: 99%
“…It would be interesting to see if the synthesis algorithms themselves can be made more scalable using SID, e.g., by combining machine learning algorithms with traditional deductive methods. An initial step towards this objective has been taken by the author and colleagues for strategy synthesis of Markov Decision Processes (MDPs) for LTL objectives [74], but much more remains to be done.…”
Section: Summary Of Other Instancesmentioning
confidence: 99%