2020
DOI: 10.1109/lcsys.2020.2980552
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning of Control Policy for Linear Temporal Logic Specifications Using Limit-Deterministic Generalized Büchi Automata

Abstract: This letter proposes a novel reinforcement learning method for the synthesis of a control policy satisfying a control specification described by a linear temporal logic formula. We assume that the controlled system is modeled by a Markov decision process (MDP). We convert the specification to a limit-deterministic generalized Büchi automaton (LDGBA) with several accepting sets that accepts all infinite sequences satisfying the formula. The LDGBA is augmented so that it explicitly records the previous visits to… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
19
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 25 publications
(19 citation statements)
references
References 14 publications
0
19
0
Order By: Relevance
“…To facilitate learning of optimal policies, the designed reward is enhanced with potential functions that effectively guide the agent toward task satisfaction without adding extra hyper-parameters to the algorithm. Unlike [18], rigorous analysis shows that the maximum probability of task satisfaction can be guaranteed. Compared to approaches based on limit deterministic Büchi automata (LDBA), e.g., [16,17], LDGBA has several accepting sets while LDBA only has one accepting set which can result in sparse rewards during training.…”
Section: Contributionsmentioning
confidence: 99%
See 3 more Smart Citations
“…To facilitate learning of optimal policies, the designed reward is enhanced with potential functions that effectively guide the agent toward task satisfaction without adding extra hyper-parameters to the algorithm. Unlike [18], rigorous analysis shows that the maximum probability of task satisfaction can be guaranteed. Compared to approaches based on limit deterministic Büchi automata (LDBA), e.g., [16,17], LDGBA has several accepting sets while LDBA only has one accepting set which can result in sparse rewards during training.…”
Section: Contributionsmentioning
confidence: 99%
“…We assign each DDPG an individual replay buffer B qi and a random process noise N qi . The corresponding weights of modular networks, i.e., Q qi x, u P θ Q q i and π qi (x |θ uq i ), are also updated at each iteration (line [15][16][17][18][19][20]. All neural networks are trained using their own replay buffer, which is a finite-sized cache that stores transitions sampled from exploring the environment.…”
Section: Modular Deep Deterministic Policy Gradientmentioning
confidence: 99%
See 2 more Smart Citations
“…This latter perspective has recently initiated another wave of research on semi-deterministic automata. Since 2015, many new results have been published: several direct translations of LTL to semideterministic automata [11,15,16,26], specialized complementation constructions for semi-deterministic automata [4,6], algorithms for quantitative model checking of MDPs based on semi-deterministic automata [13,25], a transformation of semideterministic automata to deterministic parity automata [10], and reinforcement learning of control policy using semi-deterministic automata [21].…”
Section: Introductionmentioning
confidence: 99%