A decision process in which rewards depend on history rather than merely on
the current state is called a decision process with non-Markovian rewards
(NMRDP). In decision-theoretic planning, where many desirable behaviours are
more naturally expressed as properties of execution sequences rather than as
properties of states, NMRDPs form a more natural model than the commonly
adopted fully Markovian decision process (MDP) model. While the more tractable
solution methods developed for MDPs do not directly apply in the presence of
non-Markovian rewards, a number of solution methods for NMRDPs have been
proposed in the literature. These all exploit a compact specification of the
non-Markovian reward function in temporal logic, to automatically translate the
NMRDP into an equivalent MDP which is solved using efficient MDP solution
methods. This paper presents NMRDPP (Non-Markovian Reward Decision Process
Planner), a software platform for the development and experimentation of
methods for decision-theoretic planning with non-Markovian rewards. The current
version of NMRDPP implements, under a single interface, a family of methods
based on existing as well as new approaches which we describe in detail. These
include dynamic programming, heuristic search, and structured methods. Using
NMRDPP, we compare the methods and identify certain problem features that
affect their performance. NMRDPPs treatment of non-Markovian rewards is
inspired by the treatment of domain-specific search control knowledge in the
TLPlan planner, which it incorporates as a special case. In the First
International Probabilistic Planning Competition, NMRDPP was able to compete
and perform well in both the domain-independent and hand-coded tracks, using
search control knowledge in the latter
In this paper, we present a powerful framework for describing, storing, and reasoning about infinite temporal information. This framework is an extension of cla.ssical relational databases. It represents infinite temporal information by generalized tuples defined by linear repeating points and constraints on these points. We prove tl1a.t relations formed from generalized tuples are closed under the operations of relational algebra. A characterization of the expressiveness of generalized relations is given in terms of predicates definable in Presburger arithmetic. Finally, we provide some complexity results.
Abstract-This paper describes a synthesis method that automatically derives controllers for timed discrete-event systems with nonterminating behavior modeled by timed transition graphs and specifications of control requirements expressed by metric temporal logic (MTL) formulas. Synthesis is performed by using 1) a forward-chaining search that evaluates the satisfiability of MTL formulas over sequences of states generated by occurrences of actions and 2) a control-directed backtracking technique that takes into consideration the controllability of actions. This method has several interesting features. First, the issues of controllability, safety, liveness, and real time are integrated in a single framework. Second, the synthesis process does not require explicit storage of an entire transition structure over which formulas are checked and can be stopped at any moment, giving an approximate but useful result. Third, search and control mechanisms allow circumvention of the state explosion problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.