Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Savas, Yagiz; Ornik, Melkior; Cubuktepe, Murat; Karabag, Mustafa O.; Topcu, Ufuk

doi:10.1109/tac.2019.2922583

Cited by 25 publications

(13 citation statements)

References 40 publications

(101 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Under the reachability constraint, the maximum entropy of the MDP given in Figure 5 is unbounded. For policy synthesis, we follow the procedure given in [8] and impose an upper bound on the expected total state residence time Γ. As the bound increases, the maximum entropy value of the MDP increases.…”

Section: B Inference Of Local Behaviormentioning

confidence: 99%

“…The methods introduced in [4] and [5] use the Fisher information to preserve privacy for database systems and smart meters, respectively, and they do not deal with MDPs. Planning in stochastic control settings in the presence of an adversary has been substantially explored previously; the works closest to our paper are [6]- [8]. The reference [6] provides a method for multi-agent perimeter patrolling scenarios and is not applicable to MDPs in general.…”

Section: Introductionmentioning

confidence: 99%

“…The reference [6] provides a method for multi-agent perimeter patrolling scenarios and is not applicable to MDPs in general. Papers [7], [8] propose to randomize the policy of an agent by maximizing the entropy of an induced stochastic process. While, for an MDP, increasing the entropy of a process increases randomness of the paths, it does not necessarily limit the ability of an observer to infer the transition probabilities.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Least Inferable Policies for Markov Decision Processes

Karabag

Ornik

Topcu

2019

2019 American Control Conference (ACC)

Self Cite

View full text Add to dashboard Cite

In a variety of applications, an agent's success depends on the knowledge that an adversarial observer has or can gather about the agent's decisions. It is therefore desirable for the agent to achieve a task while reducing the ability of an observer to infer the agent's policy. We consider the task of the agent as a reachability problem in a Markov decision process and study the synthesis of policies that minimize the observer's ability to infer the transition probabilities of the agent between the states of the Markov decision process. We introduce a metric that is based on the Fisher information as a proxy for the information leaked to the observer and using this metric formulate a problem that minimizes expected total information subject to the reachability constraint. We proceed to solve the problem using convex optimization methods. To verify the proposed method, we analyze the relationship between the expected total information and the estimation error of the observer, and show that, for a particular class of Markov decision processes, these two values are inversely proportional. M. O. Karabag is with the

show abstract

Section: B Inference Of Local Behaviormentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Least Inferable Policies for Markov Decision Processes

Karabag

Ornik

Topcu

2019

2019 American Control Conference (ACC)

Self Cite

View full text Add to dashboard Cite

show abstract

“…The principle of maximum causal entropy extends the maximum entropy principle to settings where there is dynamically revealed side information that causally affects the evolution of a stochastic process [5], [6]. A distribution that maximizes the causal entropy of a stochastic process (in the absence of additional constraints) is the one that makes all admissible realizations equally probable regardless of the revealed information [7]. Therefore, the causal entropy of a player's strategy provides a convenient way to quantify the dependence of its strategy to its level of information about the environment as well as the other player's strategy.…”

Section: Introductionmentioning

confidence: 99%

Entropy-Regularized Stochastic Games

Savas

Ahmadi

Tanaka

et al. 2019

2019 IEEE 58th Conference on Decision and Control (CDC)

Self Cite

View full text Add to dashboard Cite

In two-player zero-sum stochastic games, where two competing players make decisions under uncertainty, a pair of optimal strategies is traditionally described by Nash equilibrium and computed under the assumption that the players have perfect information about the stochastic transition model of the environment. However, implementing such strategies may make the players vulnerable to unforeseen changes in the environment. In this paper, we introduce entropy-regularized stochastic games where each player aims to maximize the causal entropy of its strategy in addition to its expected payoff. The regularization term balances each player's rationality with its belief about the level of misinformation about the transition model. We consider both entropy-regularized N -stage and entropy-regularized discounted stochastic games, and establish the existence of a value in both games. Moreover, we prove the sufficiency of Markovian and stationary mixed strategies to attain the value, respectively, in N -stage and discounted games. Finally, we present algorithms, which are based on convex optimization problems, to compute the optimal strategies. In a numerical example, we demonstrate the proposed method on a motion planning scenario and illustrate the effect of the regularization term on the expected payoff.

show abstract

“…Related Work. A recent study [5] showed that an entropymaximizing controller for an MDP could be synthesized efficiently by solving a convex optimization problem. In POMDPs, entropy has often been used for active sensing applications [11]- [13], where an agent seeks to select actions that maximize its information gain from the environment.…”

Section: Introductionmentioning

confidence: 99%

Unpredictable Planning Under Partial Observability

Hibbard

Savas

Tanaka

et al. 2019

2019 IEEE 58th Conference on Decision and Control (CDC)

Self Cite

View full text Add to dashboard Cite

We study the problem of synthesizing a controller that maximizes the entropy of a partially observable Markov decision process (POMDP) subject to a constraint on the expected total reward. Such a controller minimizes the predictability of a decision-maker's trajectories while guaranteeing the completion of a task expressed by a reward function. First, we prove that a decision-maker with perfect observations can randomize its paths at least as well as a decision-maker with partial observations. Then, focusing on finite-state controllers, we recast the entropy maximization problem as a so-called parameter synthesis problem for a parametric Markov chain (pMC). We show that the maximum entropy of a POMDP is lower bounded by the maximum entropy of this pMC. Finally, we present an algorithm, based on a nonlinear optimization problem, to synthesize an FSC that locally maximizes the entropy of a POMDP over FSCs with the same number of memory states. In numerical examples, we demonstrate the proposed algorithm on motion planning scenarios.

show abstract

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Cited by 25 publications

References 40 publications

Least Inferable Policies for Markov Decision Processes

Least Inferable Policies for Markov Decision Processes

Entropy-Regularized Stochastic Games

Unpredictable Planning Under Partial Observability

Contact Info

Product

Resources

About