2020 # Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

**Abstract:** We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entrop…

Help me understand this report

View preprint versions

Search citation statements

Paper Sections

Select...

4

1

Citation Types

0

13

0

Year Published

2023

2024

Publication Types

Select...

3

3

Relationship

3

3

Authors

Journals

(13 citation statements)

(101 reference statements)

0

13

0

“…Under the reachability constraint, the maximum entropy of the MDP given in Figure 5 is unbounded. For policy synthesis, we follow the procedure given in [8] and impose an upper bound on the expected total state residence time Γ. As the bound increases, the maximum entropy value of the MDP increases.…”

confidence: 99%

“…Under the reachability constraint, the maximum entropy of the MDP given in Figure 5 is unbounded. For policy synthesis, we follow the procedure given in [8] and impose an upper bound on the expected total state residence time Γ. As the bound increases, the maximum entropy value of the MDP increases.…”

confidence: 99%

“…The methods introduced in [4] and [5] use the Fisher information to preserve privacy for database systems and smart meters, respectively, and they do not deal with MDPs. Planning in stochastic control settings in the presence of an adversary has been substantially explored previously; the works closest to our paper are [6]- [8]. The reference [6] provides a method for multi-agent perimeter patrolling scenarios and is not applicable to MDPs in general.…”

confidence: 99%

“…The principle of maximum causal entropy extends the maximum entropy principle to settings where there is dynamically revealed side information that causally affects the evolution of a stochastic process [5], [6]. A distribution that maximizes the causal entropy of a stochastic process (in the absence of additional constraints) is the one that makes all admissible realizations equally probable regardless of the revealed information [7]. Therefore, the causal entropy of a player's strategy provides a convenient way to quantify the dependence of its strategy to its level of information about the environment as well as the other player's strategy.…”

confidence: 99%

“…Related Work. A recent study [5] showed that an entropymaximizing controller for an MDP could be synthesized efficiently by solving a convex optimization problem. In POMDPs, entropy has often been used for active sensing applications [11]- [13], where an agent seeks to select actions that maximize its information gain from the environment.…”

confidence: 99%