Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence 2017
DOI: 10.24963/ijcai.2017/248
|View full text |Cite
|
Sign up to set email alerts
|

Improved Strong Worst-case Upper Bounds for MDP Planning

Abstract: The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP planning, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved strong worstcase upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of states n and the number of actions k in the specified MDP; they have no dependence on affiliated variables such as the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…The convergence of these algorithms has been shown and their efficiency has been assessed experimentally. In particular, the algorithm of Policy Iteration appears to be experimentally faster than the algorithm of Bounded Utility Value Iteration (which is also usual in the stochastic case [16]).…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The convergence of these algorithms has been shown and their efficiency has been assessed experimentally. In particular, the algorithm of Policy Iteration appears to be experimentally faster than the algorithm of Bounded Utility Value Iteration (which is also usual in the stochastic case [16]).…”
Section: Resultsmentioning
confidence: 99%
“…When the horizon of the MDP is not finite, equations (16) and (17) are not enough to rank-order the policies. The length of the trajectories may be infinite, as well as their number.…”
Section: Algorithm 4: Bounded Utility Lmax(lmin) Value Iteration (Bu-mentioning
confidence: 99%
See 1 more Smart Citation