Scoping out the future: <i>American Journal of Physiology-Gastrointestinal and Liver Physiology</i>

We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite or unbounded. We then present an algorithm which is based on a convex optimization problem to synthesize a policy that maximizes the entropy of an MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP subject to a temporal logic specification. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate the relation between the restrictions imposed on the paths by a specification, the maximum entropy, and the predictability of paths.

show abstract

Incentive Design for Temporal Logic Objectives

Savas

Gupta

Ornik

et al. 2019

View full text Add to dashboard Cite

We study the problem of designing an optimal sequence of incentives that a principal should offer to an agent so that the agent's optimal behavior under the incentives realizes the principal's objective expressed as a temporal logic formula. We consider an agent with a finite decision horizon and model its decision-making process as a Markov decision process (MDP). Under certain assumptions, we present a polynomialtime algorithm to synthesize an incentive sequence that minimizes the cost to the principal. We show that if the underlying MDP has only deterministic transitions, the principal can hide its objective from the agent and still realize the desired behavior through incentives. On the other hand, an MDP with stochastic transitions may require the principal to share its objective with the agent. Finally, we demonstrate the proposed method in motion planning examples where a principal changes the optimal trajectory of an agent by providing incentives.Y. Savas and U. Topcu are with the Department of Aerospace Engineering, University of Texas at Austin, TX, USA.

show abstract

Unpredictable Planning Under Partial Observability

Hibbard

Savas

Tanaka

et al. 2019

View full text Add to dashboard Cite

We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the completion of a task expressed by a reward function. First, we prove that a decision-maker with perfect observations can randomize its paths at least as well as a decision-maker with partial observations. Then, focusing on finite-state controllers, we recast the entropy maximization problem as a so-called parameter synthesis problem for a parametric Markov chain (pMC). We show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of this pMC. Finally, we present an algorithm, based on a nonlinear optimization problem, to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In numerical examples, we demonstrate the proposed algorithm on motion planning scenarios.

show abstract

Entropy-Regularized Stochastic Games

Savas

Ahmadi

Tanaka

et al. 2019

View full text Add to dashboard Cite

In two-player zero-sum stochastic games, where two competing players make decisions under uncertainty, a pair of optimal strategies is traditionally described by Nash equilibrium and computed under the assumption that the players have perfect information about the stochastic transition model of the environment. However, implementing such strategies may make the players vulnerable to unforeseen changes in the environment. In this paper, we introduce entropy-regularized stochastic games where each player aims to maximize the causal entropy of its strategy in addition to its expected payoff. The regularization term balances each player's rationality with its belief about the level of misinformation about the transition model. We consider both entropy-regularized N -stage and entropy-regularized discounted stochastic games, and establish the existence of a value in both games. Moreover, we prove the sufficiency of Markovian and stationary mixed strategies to attain the value, respectively, in N -stage and discounted games. Finally, we present algorithms, which are based on convex optimization problems, to compute the optimal strategies. In a numerical example, we demonstrate the proposed method on a motion planning scenario and illustrate the effect of the regularization term on the expected payoff.

show abstract

Minimizing the Information Leakage Regarding High-Level Task Specifications

Hibbard

Savas

et al. 2020

IFAC-PapersOnLine

View full text Add to dashboard Cite

Adaptive backstepping control design for active suspension systems actuated by four-way valve-piston

Savas

Baştürk

2017

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yagiz Savas

Entropy Maximization for Constrained Markov Decision Processes

Design, implementation, and evaluation of a backstepping control algorithm for an active ankle–foot orthosis

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Incentive Design for Temporal Logic Objectives

Unpredictable Planning Under Partial Observability

Entropy-Regularized Stochastic Games

Minimizing the Information Leakage Regarding High-Level Task Specifications

Adaptive backstepping control design for active suspension systems actuated by four-way valve-piston

Contact Info

Product

Resources

About