Gradient-Descent for Randomized Controllers Under Partial Observability

Heck, Linus; Spel, Jip; Junges, Sebastian; Moerman, Joshua; Katoen, Joost-Pieter

doi:10.1007/978-3-030-94583-1_7

Cited by 10 publications

(11 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data availability. The tools used and data generated in our experimental evaluation are archived at DOI 10.5281/5568910 [26].…”

Section: Discussionmentioning

confidence: 99%

Gradient-Descent for Randomized Controllers under Partial Observability

Heck¹,

Spel²,

Junges³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Randomization is a powerful technique to create robust controllers, in particular in partially observable settings. The degrees of randomization have a significant impact on the system performance, yet they are intricate to get right. The use of synthesis algorithms for parametric Markov chains (pMCs) is a promising direction to support the design process of such controllers. This paper shows how to define and evaluate gradients of pMCs. Furthermore, it investigates varieties of gradient descent techniques from the machine learning community to synthesize the probabilities in a pMC. The resulting method scales to significantly larger pMCs than before and empirically outperforms the state-of-the-art, often by at least one order of magnitude.

show abstract

“…Data availability. The tools used and data generated in our experimental evaluation are archived at DOI 10.5281/5568910 [26].…”

Section: Discussionmentioning

confidence: 99%

Gradient-Descent for Randomized Controllers under Partial Observability

Heck¹,

Spel²,

Junges³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…A variety of approaches have been investigated including a branch-andbound algorithm [14] or mixed-integer linear programming (MILP) formulation [15,16]. Alternatively, one may search for randomised controllers via gradient descent [17] or via convex optimization [18Ű20]. Randomized controllers can be also extracted via deep reinforcement learning [21].…”

Section: Belief Exploration In Stormmentioning

confidence: 99%

“…The concrete start beliefs B start are determined via method select-beliefs (discussed below). Then, for each element, we execute an inductive search (l. [6][7][8][9][10][11][12][13][14][15][16][17][18] for the time given by t I /size(B set ), meaning we split the time for inductive search uniformly among all selected beliefs. We always produce an FSC from the initial belief and for this belief, we use the knowledge on how to treat memory updates (l. 13) and action prioritization (l. 14) in the same way as in Alg.…”

Section: Algorithm Overviewmentioning

confidence: 99%

See 1 more Smart Citation

Search and Explore: Symbiotic Policy Synthesis in POMDPs

Andriushchenko,

Bork,

Češka

et al. 2024

Preprint

Self Cite

View full text Add to dashboard Cite

This paper marries two state-of-the-art controller synthesis methods for partially observable Markov decision processes (POMDPs), a prominent model in sequential decision making under uncertainty. A central issue is to find a POMDP controller - that solely decides based on the observations seen so far - to achieve a total expected reward objective. As finding optimal controllers is undecidable, we concentrate on synthesising good finite-state controllers (FSCs). We do so by tightly integrating two modern, orthogonal methods for POMDP controller synthesis: a belief-based and an inductive approach. The former method obtains an FSC from a finite fragment of the so-called belief MDP, an MDP that keeps track of the probabilities of equally observable POMDP states. The latter is an inductive search technique over a set of FSCs, e.g., controllers with a fixed memory size. The key result of this paper is a symbiotic anytime algorithm that tightly integrates both approaches such that each profits from the controllers constructed by the other. Experimental results indicate a substantial improvement in the value of the controllers while significantly reducing the synthesis time and memory footprint.

show abstract

“…Parameter synthesis is to find the right values for the unknown parameters with respect to a given constraint. Various synthesis techniques have been developed for parametric Markov chains (pMCs) ranging over e.g., the gradient-based methods [Heck et al, 2022], convex optimization [Cubuktepe et al, 2018;Cubuktepe et al, 2022], and region verification [Quatmann et al, 2016]. Recently, Salmani and Katoen [2021a] have proposed a translation from pBNs to pMCs that facilitates using pMC algorithms to analyze pBNs.…”

mentioning

confidence: 99%

Fine-Tuning the Odds in Bayesian Networks

Salmani

Katoen

2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This paper addresses the -close parameter tuning problem for Bayesian networks (BNs): find a minimal -close amendment of probability entries in a given set of (rows in) conditional probability tables that make a given quantitative constraint on the BN valid. Based on the state-of-the-art "region verification" techniques for parametric Markov chains, we propose an algorithm whose capabilities go beyond any existing techniques. Our experiments show that -close tuning of large BN benchmarks with up to eight parameters is feasible. In particular, by allowing (i) varied parameters in multiple CPTs and (ii) inter-CPT parameter dependencies, we treat subclasses of parametric BNs that have received scant attention so far.

show abstract

Gradient-Descent for Randomized Controllers Under Partial Observability

Cited by 10 publications

References 43 publications

Gradient-Descent for Randomized Controllers under Partial Observability

Gradient-Descent for Randomized Controllers under Partial Observability

Search and Explore: Symbiotic Policy Synthesis in POMDPs

Fine-Tuning the Odds in Bayesian Networks

Contact Info

Product

Resources

About