Equivalence notions and model minimization in Markov decision processes

Givan, Robert; Dean, Thomas; Greig, Matthew

doi:10.1016/s0004-3702(02)00376-4

Cited by 221 publications

(270 citation statements)

References 9 publications

Supporting

Mentioning

253

Contrasting

Unclassified

Order By: Relevance

“…First, it is easy to see that ε-structured MDPs subsume previous notions of similarity like (approximate) state aggregation in MDPs [14], MDP homomorphism [15], or lax bisimulation [18]. In state aggregation, one merges states to meta-states when their rewards and transition probabilities are identical or close.…”

Section: Mdp Aggregation Mdp Homomorphism and ε-Structured Mdpsmentioning

confidence: 99%

“…Section 2 defines the setting, in Section 3 we give some examples of the restless bandit problem, as well as demonstrate that index-based policies are suboptimal. Section 4 presents the main results: the upper and lower bounds on the achievable regret in the considered problem; Sections 5 and 7 introduce the algorithm for which the upper bound is proven; the latter part relies on -structured MDPs, a generalization of concepts like (approximate) state aggregation in MDPs [14] and MDP homomorphism [15], introduced in Section 6. This section also presents an extension of the Ucrl2 algorithm of [6] designed to work in this setting.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Regret Bounds for Restless Markov Bandits

Ortner

Ryabko

Auer

et al. 2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Applied to the restless bandit setting, this algorithm achieves after any T steps regret of order Õ ( √ T ) with respect to the best policy that knows the distributions of all arms. We make no assumptions on the Markov chains underlying each arm except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.

show abstract

Section: Mdp Aggregation Mdp Homomorphism and ε-Structured Mdpsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Regret Bounds for Restless Markov Bandits

Ortner

Ryabko

Auer

et al. 2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Specifically, we define a measure of reward approximation error and transition probability approximation error achieved by state and action abstraction such that the regret of the equilibrium found in the abstract game when implemented in the original, unabstracted game is upper-bounded by some function of those measures. The analysis is in some ways similar to that of abstraction in Markov decision processes [Givan et al 2003;Ravindran 2004;Sorg and Singh 2009], but for the richer-and much more difficult-setting of games.…”

Section: Introductionmentioning

confidence: 99%

Lossy stochastic game abstraction with bounds

Sandholm¹,

Singh

2012

Proceedings of the 13th ACM Conference on Electronic Commerce

View full text Add to dashboard Cite

Abstraction followed by equilibrium finding has emerged as the leading approach to solving games. Lossless abstraction typically yields games that are still too large to solve, so lossy abstraction is needed. Unfortunately, prior lossy game abstraction algorithms have no guarantees on solution quality. We developed a framework that enables the design of lossy game abstraction algorithms with guarantees on solution quality. It simultaneously handles state and action abstraction. We define a measure of reward approximation error and transition probability error achieved by state and action abstraction in stochastic games such that the regret of the equilibrium found in the abstract game when implemented in the original, unabstracted game is upper-bounded by a function of those measures. We then develop the first lossy game abstraction algorithms with bounds on solution quality. Both of them work level-by-level up from the end of the game. One of the algorithms is greedy and the other is an integer linear program. We also prove that the abstraction problem is NP-complete (even with just action abstraction, 2 agents, and a 1-step game), but point out that this does not mean that the game abstraction problems that occur in practice cannot be solved quickly.

show abstract

“…Here we are concerned with state aggregation (for references see [1]), which tries to convert the idea that similar states (with respect to rewards and transition probabilities) may be aggregated to meta-states, and calculation of the optimal policy may then be conducted on the meta-MDP.…”

Section: Introductionmentioning

confidence: 99%

Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

Ortner

2007

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.

show abstract

Equivalence notions and model minimization in Markov decision processes

Cited by 221 publications

References 9 publications

Regret Bounds for Restless Markov Bandits

Regret Bounds for Restless Markov Bandits

Lossy stochastic game abstraction with bounds

Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

Contact Info

Product

Resources

About