Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Oura, Ryohei; Sakakibara, Ami; Ushio, Toshimitsu

doi:10.1109/lcsys.2020.2980552

Cited by 25 publications

(19 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To facilitate learning of optimal policies, the designed reward is enhanced with potential functions that effectively guide the agent toward task satisfaction without adding extra hyper-parameters to the algorithm. Unlike [18], rigorous analysis shows that the maximum probability of task satisfaction can be guaranteed. Compared to approaches based on limit deterministic Büchi automata (LDBA), e.g., [16,17], LDGBA has several accepting sets while LDBA only has one accepting set which can result in sparse rewards during training.…”

Section: Contributionsmentioning

confidence: 99%

“…We assign each DDPG an individual replay buffer B qi and a random process noise N qi . The corresponding weights of modular networks, i.e., Q qi x, u P θ Q q i and π qi (x |θ uq i ), are also updated at each iteration (line [15][16][17][18][19][20]. All neural networks are trained using their own replay buffer, which is a finite-sized cache that stores transitions sampled from exploring the environment.…”

Section: Modular Deep Deterministic Policy Gradientmentioning

confidence: 99%

“…However, scalability is a pressing issue for applying model-based approaches due to the need to store the learned model. On the other hand, by relaxing the need to construct an MDP model, model-free RL is recently adopted where appropriate reward shaping schemes are proposed [12][13][14][15][16][17][18][19][20].…”

Section: Introductionmentioning

confidence: 99%

“…q 12 = q 3 . However, for x 1 = ( s , q 1 , T ) and x 2 = ( s , q 2 , T ), we have Φ (x 1 ) = 0 and Φ (x 2 ) = 0 if the corresponding automaton component has not been visited yet (i.e., q 1 , q 2 still in T Φ ) by ( 17) and (18). For instance, given a run x = ( s , q 0 , T ) u P 0 ( s , q 1 , T ) u P 1 ( s , q 2 , T ) u P 2 ( s , q 3 , T ) , the associated shaped reward for each transition is R ( s , q 0 , T ) , u P 0 , ( s , q…”

mentioning

confidence: 99%

See 3 more Smart Citations

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Cai,

Hasanbeig,

Xiao

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) with unknown transition probabilities over continuous state and action spaces. Linear temporal logic (LTL) is used to specify high-level tasks over infinite horizon, which can be converted into a limit deterministic generalized Büchi automaton (LDGBA) with several accepting sets. The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP by incorporating a synchronous tracking-frontier function to record unvisited accepting sets of the automaton, and to facilitate the satisfaction of the accepting conditions. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states and can overcome the issues of sparse rewards. Rigorous analysis shows that any RL method that optimizes the expected discounted return is guaranteed to find an optimal policy whose traces maximize the satisfaction probability. A modular deep deterministic policy gradient (DDPG) is then developed to generate such policies over continuous state and action spaces. The performance of our framework is evaluated via an array of OpenAI gym environments.

show abstract

Section: Contributionsmentioning

confidence: 99%

Section: Modular Deep Deterministic Policy Gradientmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Cai,

Hasanbeig,

Xiao

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…This latter perspective has recently initiated another wave of research on semi-deterministic automata. Since 2015, many new results have been published: several direct translations of LTL to semideterministic automata [11,15,16,26], specialized complementation constructions for semi-deterministic automata [4,6], algorithms for quantitative model checking of MDPs based on semi-deterministic automata [13,25], a transformation of semideterministic automata to deterministic parity automata [10], and reinforcement learning of control policy using semi-deterministic automata [21].…”

Section: Introductionmentioning

confidence: 99%

Seminator 2 Can Complement Generalized Büchi Automata via Improved Semi-determinization

Blahoudek

Duret-Lutz

Strejček

2020

Computer Aided Verification

View full text Add to dashboard Cite

We present the second generation of the tool Seminator that transforms transition-based generalized Büchi automata (TGBAs) into equivalent semi-deterministic automata. The tool has been extended with numerous optimizations and produces considerably smaller automata than its first version. In connection with the state-of-the-art LTL to TGBAs translator Spot, Seminator 2 produces smaller (on average) semi-deterministic automata than the direct LTL to semi-deterministic automata translator ltl2ldgba of the Owl library. Further, Seminator 2 has been extended with an improved NCSB complementation procedure for semi-deterministic automata, providing a new way to complement automata that is competitive with state-of-the-art complementation tools.

show abstract

Deep Reinforcement Learning with Temporal Logics

Hasanbeig

Kroening

Abate

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Cited by 25 publications

References 14 publications

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic

Seminator 2 Can Complement Generalized Büchi Automata via Improved Semi-determinization

Deep Reinforcement Learning with Temporal Logics

Contact Info

Product

Resources

About