We provide the first solution for model-free reinforcement learning of ω-regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of ω-regular objectives to an almostsure reachability problem, and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. A key feature of our technique is the compilation of ω-regular properties into limitdeterministic Büchi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.An ω-word w on an alphabet Σ is a function w : N → Σ. We abbreviate w(i) by w i . The set of ω-words on Σ is written Σ ω and a subset of Σ ω is an ω-language on Σ.A probability distribution over a finite set S is a function d : S→[0, 1] such that s∈S d(s) = 1. Let D(S) denote the set of all discrete distributions over S. We say a distribution d ∈ D(S) is a point distribution if d(s)=1 for some s ∈ S. For a distribution d ∈ D(S) we write supp(d) def = {s ∈ S : d(s) > 0}.
No abstract
Constant-rate multi-mode systems are hybrid systems that can switch freely among a finite set of modes, and whose dynamics is specified by a finite number of real-valued variables with mode-dependent constant rates. The schedulability problem for such systems is to design a mode-switching policy that maintains the state within a specified safety set. The main result of the paper is that schedulability can be decided in polynomial time. We also generalize our result to optimal schedulability problems with average cost and reachability cost objectives. Polynomial-time scheduling algorithms make this class an appealing formal model for design of energy-optimal policies. The key to tractability is that the only constraints on when a scheduler can switch the mode are specified by global objectives. Adding local constraints by associating either invariants with modes, or guards with mode switches, lead to undecidability, and requiring the scheduler to make decisions only at multiples of a given sampling rate, leads to a PSPACE-complete schedulability problem.
Abstract-The theory of regular transformations of finite strings is quite mature with appealing properties. This class can be equivalently defined using both logic (Monadic secondorder logic) and finite-state machines (two-way transducers, and more recently, streaming string transducers); is closed under operations such as sequential composition and regular choice; and problems such as functional equivalence and type checking, are decidable for this class. In this paper, we initiate a study of transformations of infinite strings. The MSO-based definition for regular string transformations generalizes naturally to infinite strings. We define an equivalent generalization of the machine model of streaming string transducers to infinite strings. A streaming string transducer is a deterministic machine that makes a single pass over the input string, and computes the output fragments using a finite set of string variables that are updated in a copyless manner at each step. We show how Muller acceptance condition for automata over infinite strings can be generalized to associate an infinite output string with an infinite execution. The proof that our model captures all MSO-definable transformations uses two-way transducers. Unlike the case of finite strings, MSO-equivalent definition of two-way transducers over infinite strings needs to make decisions based on omegaregular look-ahead. Simulating this look-ahead using multiple variables with copyless updates, is the main technical challenge in our constructions. Finally, we show that type checking and functional equivalence are decidable for MSO-definable transformations of infinite strings.
An average-time game is played on the infinite graph of configurations of a finite timed automaton. The two players, Min and Max, construct an infinite run of the automaton by taking turns to perform a timed transition. Player Min wants to minimise the average time per transition and player Max wants to maximise it. A solution of averagetime games is presented using a reduction to average-price game on a finite graph. A direct consequence is an elementary proof of determinacy for average-time games. This complements our results for reachabilitytime games and partially solves a problem posed by Bouyer et al., to design an algorithm for solving average-price games on priced timed automata. The paper also establishes the exact computational complexity of solving average-time games: the problem is EXPTIME-complete for timed automata with at least two clocks.
We characterize the class of nondeterministic ω-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata 'good-for-MDPs' (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties-they are Büchi automata with low branching degree obtained through a simple construction-and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.