Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

Carr, Steven; Jansen, Nils; Topcu, Ufuk

doi:10.24963/ijcai.2020/570

Cited by 30 publications

(21 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, finding a suitable under-approximative value function reduces to finding "good" policies for M, e.g. by using randomly guessed fm-policies, machine learning methods [12], or a transformation to a parametric model [27].…”

Section: 1 Belief Cut-offsmentioning

confidence: 99%

See 1 more Smart Citation

Under-Approximating Expected Total Rewards in POMDPs

Bork¹,

Katoen²,

Quatmann³

2022

Preprint

View full text Add to dashboard Cite

We consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this-generally undecidable-problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.

show abstract

Section: 1 Belief Cut-offsmentioning

confidence: 99%

“…Previously proposed methods to solve the problem are e.g. to use approximate value iteration [21], optimisation and search techniques [1,11], dynamic programming [6], Monte Carlo simulation [43], game-based abstraction [51], and machine learning [12,13,18]. Other approaches restrict the memory size of the policies [34].…”

Section: Introductionmentioning

confidence: 99%

Under-Approximating Expected Total Rewards in POMDPs

Bork¹,

Katoen²,

Quatmann³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The last result, from [1], builds on a novel connection between training of recurrent neural networks and probabilistic model checking. More specifically, recurrent neural networks have emerged as an effective representation of control policies in sequential decisionmaking problems.…”

Section: Ccs Conceptsmentioning

confidence: 99%

Verifiable autonomy under perceptual limitations

Topcu

2021

Proceedings of the 1st International Workshop on Verification of Autonomous &Amp; Robotic Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…Building on preliminary results in Carr et al (2019Carr et al ( , 2020, the current paper makes the following contributions. First, it presents an iterative method that employs state-of-the art tools from machine learning and formal verification to find policies that ensure that an agent in a POMDP satisfies any given linear temporal logic specification.…”

Section: Introductionmentioning

confidence: 99%

Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Carr

Jansen

Topcu

2021

jair

Self Cite

View full text Add to dashboard Cite

Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP, they induce a Markov chain, and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states, which are three orders of magnitude larger than comparable approaches.

show abstract

Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints

Cited by 30 publications

References 2 publications

Under-Approximating Expected Total Rewards in POMDPs

Under-Approximating Expected Total Rewards in POMDPs

Verifiable autonomy under perceptual limitations

Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Contact Info

Product

Resources

About