2012 American Control Conference (ACC) 2012
DOI: 10.1109/acc.2012.6314926
|View full text |Cite
|
Sign up to set email alerts
|

Online Markov decision processes with Kullback-Leibler control cost

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…We are aware of only a few cases where (relative) entropy regularization has been combined with Markov decision processes and related models. [21] consider a generalization of the Markov decision process where, instead of impacting the process through some actions, the agent can directly manipulate the transition matrix of the system state. However, such manipulation would incur some cost which is proportional to the relative entropy between the transition probability after manipulation, and the transition matrix of a 'passive' process which models the 'natural' system evolution.…”
Section: Related Workmentioning
confidence: 99%
“…We are aware of only a few cases where (relative) entropy regularization has been combined with Markov decision processes and related models. [21] consider a generalization of the Markov decision process where, instead of impacting the process through some actions, the agent can directly manipulate the transition matrix of the system state. However, such manipulation would incur some cost which is proportional to the relative entropy between the transition probability after manipulation, and the transition matrix of a 'passive' process which models the 'natural' system evolution.…”
Section: Related Workmentioning
confidence: 99%
“…Given a randomized stationary policy π with stationary state distribution d π , the MDP is a Markov chain with transition matrix P π given by (3). Thus, it must satisfy the following balance equation:…”
Section: Ocmdp Algorithm 21 Preliminariesmentioning
confidence: 99%
“…These works, by leveraging a similar framework, formalize the control problem as the (unconstrained) problem of minimizing a cost that captures the discrepancy between an ideal probability density function and the actual probability density function of the system under control. An online version of these algorithms has been proposed in Guan et al (2014): in such a work, by leveraging an average cost formulation, the probability mass function for the state transitions is found. Finally, we also recall here Russo (2021); Garrabe and Russo (2022), where policies are obtained from the minimization of similar costs by leveraging multiple, specialized, datasets.…”
Section: Introductionmentioning
confidence: 99%