This paper considers parametric Markov decision processes (pMDPs) whose transitions are equipped with affine functions over a finite set of parameters. The synthesis problem is to find a parameter valuation such that the instantiated pMDP satisfies a (temporal logic) specification under all strategies. We show that this problem can be formulated as a quadratically-constrained quadratic program (QCQP) and is non-convex in general. To deal with the NP-hardness of such problems, we exploit a convex-concave procedure (CCP) to iteratively obtain local optima. An appropriate interplay between CCP solvers and probabilistic model checkers creates a procedure -realized in the tool PROPheSYthat solves the synthesis problem for models with thousands of parameters.
Multi-objective verification problems of parametric Markov decision processes under optimality criteria can be naturally expressed as nonlinear programs. We observe that many of these computationally demanding problems belong to the subclass of signomial programs. This insight allows for a sequential optimization algorithm to efficiently compute sound but possibly suboptimal solutions. Each stage of this algorithm solves a geometric programming problem. These geometric programs are obtained by convexifying the nonconvex constraints of the original problem. Direct applications of the encodings as nonlinear programs are model repair and parameter synthesis. We demonstrate the scalability and quality of our approach by well-known benchmarks.
We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite or unbounded. We then present an algorithm which is based on a convex optimization problem to synthesize a policy that maximizes the entropy of an MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP subject to a temporal logic specification. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate the relation between the restrictions imposed on the paths by a specification, the maximum entropy, and the predictability of paths.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.