Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization

Suilen, Marnix; Jansen, Nils; Cubuktepe, Murat; Topcu, Ufuk

doi:10.24963/ijcai.2020/569

Cited by 13 publications

(18 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Notably, we introduce a novel robust spacecraft motion planning scenario. We show that our method scales to significantly larger models than (Suilen et al 2020). This scalability advantage allows more precise models and adding memory to the policies.…”

Section: Contribution and Approachmentioning

confidence: 86%

“…Combining this dualization with a linear-time transformation of the uPOMDP to a so-called simple uPOMDP (Junges et al 2018) ensures exact solutions to the original problem. The exact solutions and the moderate increase in the problem size contrasts with an over-approximative solution computed using an exponentially larger encoding proposed in (Suilen et al 2020). Finite nonconvex problem.…”

Section: Contribution and Approachmentioning

confidence: 96%

“…Uncertain MDPs with full observability have been extensively studied, see for instance (Wolff, Topcu, and Murray 2012;Puggelli et al 2013;Hahn et al 2017). For uPOMDPs, (Suilen et al 2020) is also based on convex optimization. However, their resulting optimization problems are exponentially larger than ours, and they only consider memoryless policies.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Robust Finite-State Controllers for Uncertain POMDPs

Cubuktepe¹,

Jansen²,

Junges³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Uncertain partially observable Markov decision processes (uPOMDPs) allow the probabilistic transition and observation functions of standard POMDPs to belong to a so-called uncertainty set. Such uncertainty sets capture uncountable sets of probability distributions. We develop an algorithm to compute finite-memory policies for uPOMDPs that robustly satisfy given specifications against any admissible distribution. In general, computing such policies is both theoretically and practically intractable. We provide an efficient solution to this problem in four steps. (1) We state the underlying problem as a nonconvex optimization problem with infinitely many constraints. (2) A dedicated dualization scheme yields a dual problem that is still nonconvex but has finitely many constraints. (3) We linearize this dual problem and (4) solve the resulting finite linear program to obtain locally optimal solutions to the original problem. The resulting problem formulation is exponentially smaller than those resulting from existing methods. We demonstrate the applicability of our algorithm using large instances of an aircraft collision-avoidance scenario and a novel spacecraft motion planning case study.

show abstract

Section: Contribution and Approachmentioning

confidence: 86%

Section: Contribution and Approachmentioning

confidence: 96%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Robust Finite-State Controllers for Uncertain POMDPs

Cubuktepe¹,

Jansen²,

Junges³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Ahmadi et al (2020) use control barrier functions to compute safe reachable sets in the belief space of POMDPs. Extensions to epistemic or uncertain POMDPs compute FSCs using convex optimization (Cubuktepe et al, 2021;Suilen et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes

Carr

Jansen

Topcu

2021

jair

Self Cite

View full text Add to dashboard Cite

Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP, they induce a Markov chain, and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states, which are three orders of magnitude larger than comparable approaches.

show abstract

“…As an extension of the first result, in [4], we studied the problem of policy synthesis for uncertain POMDPs. The transition probability function of uncertain POMDPs is only known to belong to a so-called uncertainty set, for instance in the form of probability intervals.…”

Section: Ccs Conceptsmentioning

confidence: 99%