Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Hasanbeig, Mohammadhosein; Kantaros, Yiannis; Abate, Alessandro; Kroening, Daniel; Pappas, George J.; Lee, I.

doi:10.1109/cdc40024.2019.9028919

Cited by 90 publications

(80 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Any LTL formula ϕ can be converted into various ωautomata, namely finite state machines that recognize all infinite words satisfying ϕ. We review a generalized Büchi automaton at the beginning, and then introduce a limitdeterministic generalized Büchi automaton [10].…”

Section: B Linear Temporal Logic and Automatamentioning

confidence: 99%

“…Through the above scenario, we compare our approach with 1) a case where we first convert the tLDGBA into a tLDBA, for which the augmentation makes no change, and thus a reward function in Definition 10 is based on a single accepting set; and 2) the method using a reward function based on the accepting frontier function [9], [10]. For the three methods, we use Qlearning 1 with an epsilon-greedy policy.…”

Section: Examplementioning

confidence: 99%

“…5. The optimal policy obtained from our proposed method (left) and the method in [9], [10] (right). Fig.…”

Section: Examplementioning

confidence: 99%

“…2) We use the accepting frontier function [9], [10] for the tLDGBA Acc : δ × 2 δ → 2 δ . Initializing a set of transitions F with the set of the all accepting transitions in B ϕ , the function receives the transition (x, σ, x ) that occurs and the set F. If (x, σ, x ) is in F, then Acc removes the accepting sets containing (x, σ, x ) from F. For the product MDP of the MDP M and the tLDGBA B ϕ , the reward function is based on the removed sets of B ϕ .…”

Section: )mentioning

confidence: 99%

“…The conversion, however, fixes the order of visits to accepting sets of the GBA [3] and causes the sparsity of the reward, which is a critical issue in RL-based controller synthesis. Another approach to RL-based synthesis for generalized Büchi conditions is the accepting frontier function introduced in [9], [10], based on which the reward function is defined. However, the function is memoryless, that is, it does not provide information of accepting sets that have been visited, which is important to improve learning performance.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Oura

Sakakibara

Ushio

2020

IEEE Control Syst. Lett.

View full text Add to dashboard Cite

This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDGBA is augmented so that it explicitly records the previous visits to accepting sets. We take a product of the augmented LDGBA and the MDP, based on which we define a reward function. The agent gets rewards whenever state transitions are in an accepting set that has not been visited for a certain number of steps. Consequently, sparsity of rewards is relaxed and optimal circulations among the accepting sets are learned. We show that the proposed method can learn an optimal policy when the discount factor is sufficiently close to one.

show abstract

Section: B Linear Temporal Logic and Automatamentioning

confidence: 99%

Section: Examplementioning

confidence: 99%

“…5. The optimal policy obtained from our proposed method (left) and the method in [9], [10] (right). Fig.…”

Section: Examplementioning

confidence: 99%

Section: )mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Oura

Sakakibara

Ushio

2020

IEEE Control Syst. Lett.

View full text Add to dashboard Cite

show abstract

Deep Reinforcement Learning with Temporal Logics

Hasanbeig

Kroening

Abate

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives

Hahn

Perez

Schewe

et al. 2020

Automated Technology for Verification and Analysis

View full text Add to dashboard Cite

Omega-regular properties-specified using linear time temporal logic or various forms of omega-automata-find increasing use in specifying the objectives of reinforcement learning (RL). The key problem that arises is that of faithful and effective translation of the objective into a scalar reward for model-free RL. A recent approach exploits Büchi automata with restricted nondeterminism to reduce the search for an optimal policy for an ω-regular property to that for a simple reachability objective. A possible drawback of this translation is that reachability rewards are sparse, being reaped only at the end of each episode. Another approach reduces the search for an optimal policy to an optimization problem with two interdependent discount parameters. While this approach provides denser rewards than the reduction to reachability, it is not easily mapped to off-the-shelf RL algorithms. We propose a reward scheme that reduces the search for an optimal policy to an optimization problem with a single discount parameter that produces dense rewards and is compatible with off-the-shelf RL algorithms. Finally, we report an experimental comparison of these and other reward schemes for model-free RL with omega-regular objectives.

show abstract

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees

Cited by 90 publications

References 32 publications

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Deep Reinforcement Learning with Temporal Logics

Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives

Contact Info

Product

Resources

About