2010
DOI: 10.1145/1721837.1721849
|View full text |Cite
|
Sign up to set email alerts
|

Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

Abstract: We present algorithms for finding optimal strategies for discounted, infinite-horizon, Determinsitc Markov Decision Processes (DMDPs). Our fastest algorithm has a worst-case running time of O(mn), improving the recent bound of O(mn 2 ) obtained by Andersson and Vorbyov [2006]. We also present a randomized O(m 1/2 n 2 )-time algorithm for finding Discounted All-Pairs Shortest Paths (DAPSP), improving an O(mn 2 )-time algorithm that can be obtained using ideas of Papadimitriou and Tsitsiklis [1987]. ACM Referenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(29 citation statements)
references
References 27 publications
(28 reference statements)
0
29
0
Order By: Relevance
“…The fastest known algorithm for uniformly discounted deterministic MDPs runs in time O(mn) [MTZ10]. However, these problems were not known to be solvable in polynomial time with the more-generic simplex method.…”
Section: Introductionmentioning
confidence: 99%
“…The fastest known algorithm for uniformly discounted deterministic MDPs runs in time O(mn) [MTZ10]. However, these problems were not known to be solvable in polynomial time with the more-generic simplex method.…”
Section: Introductionmentioning
confidence: 99%
“…When the IC paths are found, the lowest payoffs for each player in Step 2 can be found in O(mn) time (Madani et al 2010;Papadimitriou and Tsitsiklis 1987), where n is the number of nodes and m is the number of edges in the finite graph of IC paths. The task is essentially the same as finding the optimal strategies for discounted, infinite-horizon, deterministic Markov decision processes (DMDPs).…”
Section: Methodsmentioning
confidence: 99%
“…If we look beyond PI, we find even subexponential bounds on the expected running time of MDP planning. Bounds of the form poly(n, k) • exp(O( n log(n))) [Matoušek et al, 1996] follow directly from posing MDP planning as a linear program with n variables and nk constraints [Littman et al, 1995]. The special structure that results when k = 2 admits an even tighter bound of poly(n) • exp(2 √ n) [Gärtner, 2002].…”
Section: Related Work and Contributionmentioning
confidence: 99%
“…Alternatively, if we fix n, the linear programming route can yield strong worst-case bounds that are linear in k: for example, Megiddo, 1984] and n O(n) • k [Chazelle and Matousek, 1996]. It must also be noted that for deterministic MDPs, strong worst-case bounds of the form poly(n, k) are indeed possible [Madani et al, 2010;Post and Ye, 2013].…”
Section: Related Work and Contributionmentioning
confidence: 99%