Formal Policy Synthesis for Continuous-State Systems via Reinforcement Learning

Kazemi, Milad; Soudjani, Sadegh

doi:10.1007/978-3-030-63461-2_1

Cited by 18 publications

(10 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The work provides control strategies maximizing the probability of satisfaction over unknown continuous-space dt-SCS while providing probabilistic closeness guarantees in the form of (2.3). Similarly based on the scheme in Figure 13, extensions to continuous spaces and ω-regular properties with formal guarantees are studied in [HAK19b,HKA20,KS20]. This line of works leads to the following problem.…”

Section: Temporal Logic Verification and Synthesismentioning

confidence: 99%

See 1 more Smart Citation

Automated Verification and Synthesis of Stochastic Hybrid Systems: A Survey

Lavaei¹,

Soudjani²,

Abate³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Stochastic hybrid systems have received significant attentions as a relevant modelling framework describing many systems, from engineering to the life sciences: they enable the study of numerous applications, including transportation networks, biological systems and chemical reaction networks, smart energy and power grids, and beyond. Automated verification and policy synthesis for stochastic hybrid systems can be inherently challenging: this is due to the heterogeneity of their dynamics (presence of continuous and discrete components), the presence of uncertainty, and in some applications the large dimension of state and input sets. Over the past few years, a few hundred articles have investigated these models, and developed diverse and powerful approaches to mitigate difficulties encountered in the analysis and synthesis of such complex stochastic systems. In this survey, we overview the most recent results in the literature and discuss different approaches, including (in)finite abstractions, verification and synthesis for temporal logic specifications, stochastic similarity relations, (control) barrier certificates, compositional techniques, and a selection of results on continuous-time stochastic systems; we finally survey recently developed software tools that implement the discussed approaches. Throughout the manuscript we discuss a few open topics to be considered as potential future research directions: we hope that this survey will guide younger researchers through a comprehensive understanding of the various challenges, tools, and solutions in this enticing and rich scientific area.

show abstract

Section: Temporal Logic Verification and Synthesismentioning

confidence: 99%

“…A data-driven technique for satisfying temporal properties on unknown stochastic processes with continuous spaces is recently presented in [KS20]. The proposed framework is based on reinforcement learning that is used to compute sub-optimal policies that are finite-memory and deterministic.…”

Section: Directions For Open Researchmentioning

confidence: 99%

Automated Verification and Synthesis of Stochastic Hybrid Systems: A Survey

Lavaei¹,

Soudjani²,

Abate³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…When complex logical properties are of interest, e.g., those expressed as linear temporal logic formulae over finite traces (a.k.a. LTL F formulae [14]), results in [15,16,17] provide formal safety guarantees for AI-based controllers by considering the desired properties in the reward functions. Note that these results are only applicable to those AI-based controllers whose reward functions are easy to be designed, while reward functions for some control tasks are difficult to be obtained (e.g., [18]).…”

Section: Introductionmentioning

confidence: 99%

Sandboxing (AI-based) Unverified Controllers in Stochastic Games: An Abstraction-based Approach with Safe-visor Architecture

Zhong¹,

Cao²,

Caccamo³

2022

Preprint

View full text Add to dashboard Cite

In this paper, we propose a construction scheme for a Safe-visor architecture for sandboxing unverified controllers, e.g., artificial intelligence-based (a.k.a. AI-based) controllers, in two-players non-cooperative stochastic games. Concretely, we leverage abstraction-based approaches to construct a supervisor that checks and decides whether or not to accept the inputs provided by the unverified controller, and a safety advisor that provides fallback control inputs to ensure safety whenever the unverified controller is rejected. Moreover, by leveraging an ( , δ)-approximate probabilistic relation between the original game and its finite abstraction, we provide a formal safety guarantee with respect to safety specifications modeled by deterministic finite automata (DFA), while the functionality of the unverified controllers is still exploited. To show the effectiveness of the proposed results, we apply them to a control problem of a quadrotor tracking a moving ground vehicle, in which an AI-based unverified controller is employed to control the quadrotor.

show abstract

“…The closest line of work to ours, which aims to avoid HRL requirements, are model-based (Fu and Topcu 2014;Sadigh et al 2014;Fulton and Platzer 2018;Cai et al 2021) or model-free RL approaches that constrain the agent with a temporal logic property (Hasanbeig et al 2018;Toro Icarte et al 2018;Camacho et al 2019;Hasanbeig et al 2019a;Yuan et al 2019;De Giacomo et al 2019, 2020Hasanbeig et al 2019dHasanbeig et al ,c, 2020bKazemi and Soudjani 2020;Lavaei et al 2020). These approaches are limited to finitestate systems, or more importantly require the temporal logic formula to be known a priori.…”

Section: Introductionmentioning

confidence: 99%

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

Hasanbeig

Jeppu

Abate

et al. 2021

AAAI

View full text Add to dashboard Cite

This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton so that the generation of a control policy by deep RL is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse non-Markovian rewards. We have evaluated DeepSynth's performance in a set of experiments that includes the Atari game Montezuma's Revenge. Compared to existing approaches, we obtain a reduction of two orders of magnitude in the number of iterations required for policy synthesis, and also a significant improvement in scalability.

show abstract

Formal Policy Synthesis for Continuous-State Systems via Reinforcement Learning

Cited by 18 publications

References 25 publications

Automated Verification and Synthesis of Stochastic Hybrid Systems: A Survey

Automated Verification and Synthesis of Stochastic Hybrid Systems: A Survey

Sandboxing (AI-based) Unverified Controllers in Stochastic Games: An Abstraction-based Approach with Safe-visor Architecture

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

Contact Info

Product

Resources

About