A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications

Sadigh, Dorsa; Kim, Eric S.; Coogan, Samuel; Sastry, S. Shankar; Seshia, Sanjit A.

doi:10.1109/cdc.2014.7039527

Cited by 113 publications

(145 citation statements)

References 14 publications

Supporting

Mentioning

145

Contrasting

Order By: Relevance

“…The PAC MDP is generated via an RLlike algorithm, then value iteration is applied to update state values. A similar model-based solution is proposed in [18]: this also hinges on approximating the transition probabilities, which limits the precision of the policy generation process. Unlike the problem that is considered in this paper, the work in [18] is limited to policies whose traces satisfy the property with probability one.…”

Section: Introductionmentioning

confidence: 99%

“…A similar model-based solution is proposed in [18]: this also hinges on approximating the transition probabilities, which limits the precision of the policy generation process. Unlike the problem that is considered in this paper, the work in [18] is limited to policies whose traces satisfy the property with probability one. Moreover, [16]- [18] require to learn all transition probabilities of the MDP.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Hasanbeig

Kantaros

Abate

et al. 2019

2019 IEEE 58th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Reinforcement Learning (RL) has emerged as an efficient method of choice for solving complex sequential decision making problems in automatic control, computer science, economics, and biology. In this paper we present a model-free RL algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general case than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Hasanbeig

Kantaros

Abate

et al. 2019

2019 IEEE 58th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

show abstract

“…In many studies on LTL synthesis problems using RL, reward functions are formed systematically from automata corresponding to the LTL specification. This direction was first investigated by Sadigh et al [5], where they defined a reward function based on the acceptance condition of a deterministic Rabin automaton [3] that accepts all words satisfying the LTL This work was partially supported by JST-ERATO HASUO Project Grant Number JPMJER1603, Japan, and JST-Mirai Program Grant Number JP-MJMI18B4, Japan. The work of A. Sakakibara was supported by the Grantin-Aid for Japan Society for the Promotion of Science Research Fellow under Grant JP19J13487.…”

Section: Introductionmentioning

confidence: 99%

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Oura

Sakakibara

Ushio

2020

IEEE Control Syst. Lett.

View full text Add to dashboard Cite

This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDGBA is augmented so that it explicitly records the previous visits to accepting sets. We take a product of the augmented LDGBA and the MDP, based on which we define a reward function. The agent gets rewards whenever state transitions are in an accepting set that has not been visited for a certain number of steps. Consequently, sparsity of rewards is relaxed and optimal circulations among the accepting sets are learned. We show that the proposed method can learn an optimal policy when the discount factor is sufficiently close to one.

show abstract

“…It would be interesting to see if the synthesis algorithms themselves can be made more scalable using SID, e.g., by combining machine learning algorithms with traditional deductive methods. An initial step towards this objective has been taken by the author and colleagues for strategy synthesis of Markov Decision Processes (MDPs) for LTL objectives [74], but much more remains to be done.…”

Section: Summary Of Other Instancesmentioning

confidence: 99%

Combining Induction, Deduction, and Structure for Verification and Synthesis

Seshia

2015

Proc. IEEE

Self Cite

View full text Add to dashboard Cite

Abstract-Even with impressive advances in formal methods, certain major challenges remain. Chief amongst these are environment modeling, incompleteness in specifications, and the hardness of underlying decision problems.In this paper, we characterize two trends that show great promise in meeting these challenges. The first trend is to perform verification by reduction to synthesis. The second is to solve the resulting synthesis problem by integrating traditional, deductive methods with inductive inference (learning from examples) using hypotheses about system structure. We present a formalization of such an integration, show how it can tackle hard problems in verification and synthesis, and outline directions for future work.

show abstract

A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications

Cited by 113 publications

References 14 publications

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Combining Induction, Deduction, and Structure for Verification and Synthesis

Contact Info

Product

Resources

About