Counterexample-guided permissive supervisor synthesis for probabilistic systems through learning

Lin, Hai

doi:10.1109/acc.2015.7171174

Cited by 11 publications

(5 citation statements)

References 19 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This paper merges and further develops our previous results [22], [23]. While the same L* learning algorithm [14] was considered in synthesis process, we changed the form of the supervisor and considered positive counterexamples so that the results can become less conservative.…”

Section: Introductionsupporting

confidence: 60%

“…The resulting π i will then be passed to the third stage to answer the conjecture in our modified L* learning algorithm. Note that in our previous work [22], [23], we proposed to apply the hop-constrained k shortest path (HKSP) algorithm to find the smallest k paths so that eliminating them will make sure the DTMC will not be counterexample any more. The complexity of HKSP is higher than the HSP since HKSP has to search for k as well.…”

Section: B Single Agent Casementioning

confidence: 99%

“…We assume that any a i ∈ A i , ∀i ∈ [1, N ] is controllable meaning that we could disable or enable it if we want. Note that in our previous results [22], [23], the local supervisor's alphabet Σ i = A i . From the definition of the parallel composition ||, it can be seen that it only keeps track of the action sequence but not state sequence.…”

Section: Problem Formulation Our Target System Model Is An Mas Consis...mentioning

confidence: 99%

See 2 more Smart Citations

Permissive Supervisor Synthesis for Markov Decision Processes Through Learning

Zhang

Lin

2019

IEEE Trans. Automat. Contr.

Self Cite

View full text Add to dashboard Cite

This paper considers the permissive supervisor synthesis for probabilistic systems modeled as Markov Decision Processes (MDP). Such systems are prevalent in power grids, transportation networks, communication networks and robotics. Unlike centralized planning and optimization based planning, we propose a novel supervisor synthesis framework based on learning and compositional model checking to generate permissive local supervisors in a distributed manner. With the recent advance in assume-guarantee reasoning verification for probabilistic systems, building the composed system can be avoided to alleviate the state space explosion and our framework learn the supervisors iteratively based on the counterexamples from verification. Our approach is guaranteed to terminate in finite steps and to be correct.

show abstract

Section: Introductionsupporting

confidence: 60%

Section: B Single Agent Casementioning

confidence: 99%

Section: Problem Formulation Our Target System Model Is An Mas Consis...mentioning

confidence: 99%

See 1 more Smart Citation

Permissive Supervisor Synthesis for Markov Decision Processes Through Learning

Zhang

Lin

2019

IEEE Trans. Automat. Contr.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The L* learning approach assumes the existence of a teacher who can answer the membership and equivalence queries (Angluin 1987;Wu and Lin 2015;Wu, Zhang, and Lin 2018;Zhang, Wu, and Lin 2015;. Our approach fulfills the role of the teacher by an RL engine and the queries are answered through interaction with the environment through the RL engine.…”

Section: Related Workmentioning

confidence: 99%

Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples

Xu¹,

Ojha

Neider

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Despite the fact that deep reinforcement learning (RL) has surpassed human-level performances in various tasks, it still has several fundamental challenges such as extensive data requirement and lack of interpretability. We investigate the RL problem with non-Markovian reward functions to address such challenges. We enable an RL agent to extract high-level knowledge in the form of finite reward automata, a type of Mealy machines that encode non-Markovian reward functions. The finite reward automata can be converted to deterministic finite state machines, which can be further translated to regular expressions. Thus, this representation is more interpretable than other forms of knowledge representation such as neural networks. We propose an active learning approach that iteratively infers finite reward automata and performs RL (specifically, q-learning) based on the inferred finite reward automata. The inference method is inspired by the L* learning algorithm, and modified in the framework of RL. We maintain two different q-functions, one for answering the membership queries in the L* learning algorithm and the other one for obtaining optimal policies for the inferred finite reward automaton. The experiments show that the proposed approach converges to optimal policies in at most 50% of the training steps as in the two state-of-the-art baselines.

show abstract

“…Also due to the partial observability in POMDPs, applying these methods to POMDP are fundamentally difficult. Besides these works, using the L * algorithm to learn system supervisor has also been considered in our previous work for MDPs [42]. To apply the L * algorithm to POMDP supervisor synthesis, in this paper, we extensively discuss the supervisor synthesis framework and design new membership query and conjecture checking rules to overcome the difficulties brought by partial observability.…”

Section: A Related Workmentioning

confidence: 99%

Supervisor Synthesis of POMDP based on Automata Learning

Zhang,

Wu,

Lin

2017

Preprint

Self Cite

View full text Add to dashboard Cite

As a general and thus popular model for autonomous systems, partially observable Markov decision process (POMDP) can capture uncertainties from different sources like sensing noises, actuation errors, and uncertain environments. However, its comprehensiveness makes the planning and control in POMDP difficult. Traditional POMDP planning problems target to find the optimal policy to maximize the expectation of accumulated rewards. But for safety critical applications, guarantees of system performance described by formal specifications are desired, which motivates us to consider formal methods to synthesize supervisor for POMDP. With system specifications given by Probabilistic Computation Tree Logic (PCTL), we propose a supervisory control framework with a type of deterministic finite automata (DFA), za-DFA, as the controller form. While the existing work mainly relies on optimization techniques to learn fixed-size finite state controllers (FSCs), we develop an L * learning based algorithm to determine both space and transitions of za-DFA. Membership queries and different oracles for conjectures are defined. The learning algorithm is sound and complete. An example is given in detailed steps to illustrate the supervisor synthesis algorithm.

show abstract

Counterexample-guided permissive supervisor synthesis for probabilistic systems through learning

Cited by 11 publications

References 19 publications

Permissive Supervisor Synthesis for Markov Decision Processes Through Learning

Permissive Supervisor Synthesis for Markov Decision Processes Through Learning

Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples

Supervisor Synthesis of POMDP based on Automata Learning

Contact Info

Product

Resources

About