We study the computational complexity of central analysis problems for One-Counter Markov Decision Processes (OC-MDPs), a class of finitely-presented, countable-state MDPs. OC-MDPs extend finite-state MDPs with an unbounded counter. The counter can be incremented, decremented, or not changed during each state transition, and transitions may be enabled or not depending on both the current state and on whether the counter value is 0 or not. Some states are "random", from where the next transition is chosen according to a given probability distribution, while other states are "controlled", from where the next transition is chosen by the controller. Different objectives for the controller give rise to different computational problems, aimed at computing optimal achievable objective values and optimal strategies. OC-MDPs are in fact equivalent to a controlled extension of (discrete-time) Quasi-Birth-Death processes (QBDs), a purely stochastic model heavily studied in queueing theory and applied probability. They can thus be viewed as a natural "adversarial" extension of a classic stochastic model. They can also be viewed as a natural probabilistic/controlled extension of classic onecounter automata. OC-MDPs also subsume (as a very restricted special case) a recently studied MDP model called "solvency games" that model a risk-averse gambling scenario. Basic computational questions for OC-MDPs include "termination" questions and "limit" questions, such as the following: does the controller have a strategy to ensure that the counter (which may, for example, count the number of jobs in the queue) will hit value 0 (the empty queue) almost surely (a.s.)? Or that the counter will have lim sup value ∞, a.s.? Or, that it will hit value 0 in a selected terminal state, a.s.? Or, in case such properties are not satisfied almost surely, compute their optimal probability over all strategies. We provide new upper and lower bounds on the complexity of such problems. Specifically, we show that several quantitative and almost-sure limit problems can be answered in polynomial time, and that almost-sure termination problems (without selection of desired terminal states) can also be answered in polynomial time. On the other hand, we show that the almost-sure termination problem with selected terminal states is PSPACE-hard and we provide an exponential time algorithm for this problem. We also characterize classes of strategies that suffice for optimality in several of these settings. Our upper bounds combine a number of techniques from the theory of MDP reward models, the theory of random walks, and a variety of automata-theoretic methods.given probability. Such questions are similar in spirit to questions asked in the rich literature on "adversarial queueing theory" (see, e.g., [4]), although this is a somewhat different setting. These considerations lead naturally to the extension of QBDs with control, and thus to OC-MDPs. Indeed, MDP variants of QBDs have already been studied in the stochastic modeling literature, see [27,19]. However, ...
We begin by observing that (discrete-time) QuasiBirth-Death Processes (QBDs) are equivalent, in a precise sense, to (discrete-time) probabilistic 1-Counter Automata (p1CAs), and both Tree-Like QBDs (TLQBDs) and Tree-Structured QBDs (TS-QBDs) are equivalent to both probabilistic Pushdown Systems (pPDSs) and Recursive Markov Chains (RMCs).We then proceed to exploit these connections to obtain a number of new algorithmic upper and lower bounds for central computational problems about these models. Our main result is this: for an arbitrary QBD (even a null-recurrent one), we can approximate its termination probabilities (i.e., its G matrix) to within i bits of precision (i.e., within additive error 1/2 i ), in time polynomial in both the encoding size of the QBD and in i, in the unit-cost rational arithmetic RAM model of computation. Specifically, we show that a decomposed Newton's method can be used to achieve this.We emphasize that this bound is very different from the well-known "linear/quadratic convergence" of numerical analysis, known for QBDs and TL-QBDs, which typically gives no constructive bound in terms of the encoding size of the system being solved. In fact, we observe (based on recent results for pPDSs) that for the more general TL-QBDs this bound fails badly. Specifically, in the worst case Newton's method "converges linearly" to the termination probabilities for TL-QBDs, but requires exponentially many iterations in the encoding size of the TL-QBD to approximate these probabilities within any non-trivial constant error c < 1.Our upper bound proof for QBDs combines several ingredients: a detailed analysis of the structure of 1-counter automata, an iterative application of a classic condition number bound for errors in linear systems, and a very recent constructive bound on the performance of Newton's method for monotone systems of polynomial equations.
We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almostsure reachability problem, and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of ω-regular properties into limitdeterministic Büchi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.An ω-word w on an alphabet Σ is a function w : N → Σ. We abbreviate w(i) by w i . The set of ω-words on Σ is written Σ ω and a subset of Σ ω is an ω-language on Σ.A probability distribution over a finite set S is a function d : S→[0, 1] such that s∈S d(s) = 1. Let D(S) denote the set of all discrete distributions over S. We say a distribution d ∈ D(S) is a point distribution if d(s)=1 for some s ∈ S. For a distribution d ∈ D(S) we write supp(d) def = {s ∈ S : d(s) > 0}.
Abstract.We study the computational complexity of Nash equilibria in concurrent games with limit-average objectives. In particular, we prove that the existence of a Nash equilibrium in randomised strategies is undecidable, while the existence of a Nash equilibrium in pure strategies is decidable, even if we put a constraint on the payoff of the equilibrium. Our undecidability result holds even for a restricted class of concurrent games, where nonzero rewards occur only on terminal states. Moreover, we show that the constrained existence problem is undecidable not only for concurrent games but for turn-based games with the same restriction on rewards. Finally, we prove that the constrained existence problem for Nash equilibria in (pure or randomised) stationary strategies is decidable and analyse its complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.