We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown policy iteration style algorithms have exponential lower bounds in a two player game setting. We extend these lower bounds to Markov decision processes with the total reward and average-reward optimality criteria.
Abstract.A recent result has shown that reachability in two-clock timed automata is log-space equivalent to reachability in bounded onecounter automata [6]. We show that reachability in bounded one-counter automata is PSPACE-complete.
Parity games play an important role in model checking and synthesis. In their paper, Calude et al. have recently shown that these games can be solved in quasi-polynomial time. We show that their algorithm can be implemented efficiently: we use their data structure as a progress measure, allowing for a backward implementation instead of a complete unravelling of the game. To achieve this, a number of changes have to be made to their techniques, where the main one is to add power to the antagonistic player that allows for determining her rational move without changing the outcome of the game. We provide a first implementation for a quasi-polynomial algorithm, test it on small examples, and provide a number of side results, including minor algorithmic improvements, a quasi bi-linear complexity in the number of states and edges for a fixed number of colours, and matching lower bounds for the algorithm of Calude et al.
A recent body of experimental literature has studied empirical game-theoretical analysis, in which we have partial knowledge of a game, consisting of observations of a subset of the pure-strategy profiles and their associated payoffs to players. The aim is to find an exact or approximate Nash equilibrium of the game, based on these observations. It is usually assumed that the strategy profiles may be chosen in an on-line manner by the algorithm. We study a corresponding computational learning model, and the query complexity of learning equilibria for various classes of games. We give basic results for exact equilibria of bimatrix and graphical games. We then study the query complexity of approximate equilibria in bimatrix games. Finally, we study the query complexity of exact equilibria in symmetric network congestion games. For directed acyclic networks, we can learn the cost functions (and hence compute an equilibrium) while querying just a small fraction of pure-strategy profiles. For the special case of parallel links, we have the stronger result that an equilibrium can be identified while only learning a small fraction of the cost values.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.