2003
DOI: 10.1016/s0004-3702(02)00376-4
|View full text |Cite
|
Sign up to set email alerts
|

Equivalence notions and model minimization in Markov decision processes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
253
0
15

Year Published

2005
2005
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 221 publications
(270 citation statements)
references
References 9 publications
2
253
0
15
Order By: Relevance
“…First, it is easy to see that ε-structured MDPs subsume previous notions of similarity like (approximate) state aggregation in MDPs [14], MDP homomorphism [15], or lax bisimulation [18]. In state aggregation, one merges states to meta-states when their rewards and transition probabilities are identical or close.…”
Section: Mdp Aggregation Mdp Homomorphism and ε-Structured Mdpsmentioning
confidence: 99%
See 1 more Smart Citation
“…First, it is easy to see that ε-structured MDPs subsume previous notions of similarity like (approximate) state aggregation in MDPs [14], MDP homomorphism [15], or lax bisimulation [18]. In state aggregation, one merges states to meta-states when their rewards and transition probabilities are identical or close.…”
Section: Mdp Aggregation Mdp Homomorphism and ε-Structured Mdpsmentioning
confidence: 99%
“…Section 2 defines the setting, in Section 3 we give some examples of the restless bandit problem, as well as demonstrate that index-based policies are suboptimal. Section 4 presents the main results: the upper and lower bounds on the achievable regret in the considered problem; Sections 5 and 7 introduce the algorithm for which the upper bound is proven; the latter part relies on -structured MDPs, a generalization of concepts like (approximate) state aggregation in MDPs [14] and MDP homomorphism [15], introduced in Section 6. This section also presents an extension of the Ucrl2 algorithm of [6] designed to work in this setting.…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, we define a measure of reward approximation error and transition probability approximation error achieved by state and action abstraction such that the regret of the equilibrium found in the abstract game when implemented in the original, unabstracted game is upper-bounded by some function of those measures. The analysis is in some ways similar to that of abstraction in Markov decision processes [Givan et al 2003;Ravindran 2004;Sorg and Singh 2009], but for the richer-and much more difficult-setting of games.…”
Section: Introductionmentioning
confidence: 99%
“…Here we are concerned with state aggregation (for references see [1]), which tries to convert the idea that similar states (with respect to rewards and transition probabilities) may be aggregated to meta-states, and calculation of the optimal policy may then be conducted on the meta-MDP.…”
Section: Introductionmentioning
confidence: 99%