2022
DOI: 10.1109/tcyb.2021.3102510
|View full text |Cite
|
Sign up to set email alerts
|

Parameterized MDPs and Reinforcement Learning Problems—A Maximum Entropy Principle-Based Framework

Abstract: We present a framework to address a class of sequential decision making problems. Our framework features learning the optimal control policy with robustness to noisy data, determining the unknown state and action parameters, and performing sensitivity analysis with respect to problem parameters. We consider two broad categories of sequential decision making problems modeled as infinite horizon Markov Decision Processes (MDPs) with (and without) an absorbing state. The central idea underlying our framework is t… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(12 citation statements)
references
References 34 publications
0
12
0
Order By: Relevance
“…Remark 1. To ensure that the cumulative cost J µ ζη (s) is finite for all s ∈ S and the system reaches the cost-free termination state δ in finite steps, we assume that there exists atleast one proper policy μ(a|s) ∈ {0, 1} ∀a ∈ A, s ∈ S, and for all parameter values in ζ and η, under which there is a non-zero probability to reach the cost-free termination state δ starting from any state s ∈ S (please see [12] for proof ).…”
Section: Problem Formulationmentioning
confidence: 99%
See 4 more Smart Citations
“…Remark 1. To ensure that the cumulative cost J µ ζη (s) is finite for all s ∈ S and the system reaches the cost-free termination state δ in finite steps, we assume that there exists atleast one proper policy μ(a|s) ∈ {0, 1} ∀a ∈ A, s ∈ S, and for all parameter values in ζ and η, under which there is a non-zero probability to reach the cost-free termination state δ starting from any state s ∈ S (please see [12] for proof ).…”
Section: Problem Formulationmentioning
confidence: 99%
“…Further, due to the additional state and action parameters it is difficult to model para-SDMs directly by using the existing frameworks [2,3,4,5] that model SDMs. We have addressed the static para-SDMs in [12].…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations