2020
DOI: 10.1109/tac.2019.2922583
|View full text |Cite
|
Sign up to set email alerts
|

Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Abstract: We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entrop… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 25 publications
(13 citation statements)
references
References 40 publications
0
13
0
Order By: Relevance
“…Under the reachability constraint, the maximum entropy of the MDP given in Figure 5 is unbounded. For policy synthesis, we follow the procedure given in [8] and impose an upper bound on the expected total state residence time Γ. As the bound increases, the maximum entropy value of the MDP increases.…”
Section: B Inference Of Local Behaviormentioning
confidence: 99%
See 2 more Smart Citations
“…Under the reachability constraint, the maximum entropy of the MDP given in Figure 5 is unbounded. For policy synthesis, we follow the procedure given in [8] and impose an upper bound on the expected total state residence time Γ. As the bound increases, the maximum entropy value of the MDP increases.…”
Section: B Inference Of Local Behaviormentioning
confidence: 99%
“…The methods introduced in [4] and [5] use the Fisher information to preserve privacy for database systems and smart meters, respectively, and they do not deal with MDPs. Planning in stochastic control settings in the presence of an adversary has been substantially explored previously; the works closest to our paper are [6]- [8]. The reference [6] provides a method for multi-agent perimeter patrolling scenarios and is not applicable to MDPs in general.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The principle of maximum causal entropy extends the maximum entropy principle to settings where there is dynamically revealed side information that causally affects the evolution of a stochastic process [5], [6]. A distribution that maximizes the causal entropy of a stochastic process (in the absence of additional constraints) is the one that makes all admissible realizations equally probable regardless of the revealed information [7]. Therefore, the causal entropy of a player's strategy provides a convenient way to quantify the dependence of its strategy to its level of information about the environment as well as the other player's strategy.…”
Section: Introductionmentioning
confidence: 99%
“…Related Work. A recent study [5] showed that an entropymaximizing controller for an MDP could be synthesized efficiently by solving a convex optimization problem. In POMDPs, entropy has often been used for active sensing applications [11]- [13], where an agent seeks to select actions that maximize its information gain from the environment.…”
Section: Introductionmentioning
confidence: 99%