Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk-averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows obtaining risk-averse policies, but ignores optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem, where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it. arXiv:1804.10601v2 [cs.AI] 30 Apr 2018 lem the probabilistic constraint is state-based (some states should be avoided) rather than the execution-based discounted-sum payoff. This problem was considered in [Santana et al., 2016], but only with deterministic policies. As already noted in [Santana et al., 2016], randomized (or mixed) policies are more powerful. 2. Probability 1 bound. The special case of the EOPG problem with α = 0 has been considered in [Chatterjee et al., 2017]. This formulation represents the case with no risk. Our Contributions. Our main contributions are as follows:1. Algorithm. We present a randomized algorithm for approximating (up to any given precision) the optimal solution to the EOPG problem. This is the first approach to solve the EOPG problem for discounted-sum POMDPs. 2. Practical approach. We present a practical approach where certain searches of our algorithm are only performed for a time bound. This gives an anytime algorithm which approximates the probabilistic guarantee and then optimizes the expectation. 3. Experimental results. We present experimental results of our algorithm on classical POMDPs. Due to space constraints, details such as full proofs are deferred to the appendix. Related Works. POMDPs with discounted-sum payoff have been widely studied, both for theoretical results [Papadimitriou and Tsitsiklis, 1987;Littman, 1996] as well as practical tools [Kurniawati et al., 2008;Silver and Veness, 2010;Ye et al., 2017]. Traditionally, expectation optimization has been considered, and recent works consider policies that optimize probabilities to ensure discounted-sum payoff above a threshold [Hou et al., 2016]. Several problems related to the EOPG problem have been considered before: (a) for probability threshold 1 for long-run average and stochastic shortest path problem in fully-observable MDPs [Bruyère et al., 2014;Randour et al., 2015]; (b) for risk bound 0 for discountedsum payoff for POMDPs [Chatterjee et al., 2017]; and (c) for general probability threshold for long-run average payo...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.