Markov decision processes with uncertain transition rates: sensitivity and robust control

Kalyanasundaram, Suresh; Chong, Edwin K. P.; Shroff, Ness B.

doi:10.1109/cdc.2002.1184956

Cited by 18 publications

(16 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our work is motivated by the fact that in many practical problems, the transition matrices have to be estimated from data, and this may be a difficult task; see, for example, Kalyanasundaram et al (2001), Feinberg and Shwartz (2002), Abbad and Filar (1992), and Abbad et al (1992). It turns out that estimation errors may have a huge impact on the solution, which is often quite sensitive to changes in the transition probabilities.…”

Section: Introductionmentioning

confidence: 99%

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Nilim

Ghaoui

2005

Operations Research

613

688

View full text Add to dashboard Cite

Optimal solutions to Markov decision problems may be very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of these probabilities is far from accurate. Hence, estimation errors are limiting factors in applying Markov decision processes to real-world problems.We consider a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets. We show that perfect duality holds for this problem, and that as a consequence, it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm. We show that a particular choice of the uncertainty sets, involving likelihood regions or entropy bounds, leads to both a statistically accurate representation of uncertainty, and a complexity of the robust recursion that is almost the same as that of the classical recursion. Hence, robustness can be added at practically no extra computing cost. We derive similar results for other uncertainty sets, including one with a finite number of possible values for the transition matrices.We describe in a practical path planning example the benefits of using a robust strategy instead of the classical optimal strategy; even if the uncertainty level is only crudely guessed, the robust strategy yields a much better worst-case expected travel time. Notation P > 0 or P 0 refers to the strict or nonstrict componentwise inequality for matrices or vectors. For a vector p > 0, log p refers to the componentwise operation. The notation 1 refers to the vector of ones, with size determined from context. The probability simplex in R n is denoted n = p ∈ R n + p T 1 = 1 , while n is the set of n × n transition matrices (componentwise nonnegative matrices with rows summing to one). We use to denote the support function of a set ⊆ R n , with for v ∈ R n , v = sup p T v p ∈ .

show abstract

Section: Introductionmentioning

confidence: 99%

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Nilim

Ghaoui

2005

Operations Research

613

688

View full text Add to dashboard Cite

show abstract

“…Inspired by the work of Satia and Lave [35], models involving imprecision have also been applied to the related field of Markov decision processes. Some studies have been devoted to obtain max-min policies, max-max policies and maximal policies when the MDP model is not accurate [5,23].…”

Section: Related Workmentioning

confidence: 99%

Solving partially observable problems with inaccurate PSR models

Liu

Yang

2014

Information Sciences

View full text Add to dashboard Cite

“…With those uncertain transition matrices, robust dynamic programming is desired to address the issue of designing approximation method with an appropriate robustness to extend the power of the Bellman Equation. Representative efforts in developing robust dynamic programming can be found in [6]- [10]. One commonly used principle of optimality criterion for robust algorithms is to minimize the maximum value functions for all initial states, which is referred to as robust uniform optimality criterion in this paper.…”

Section: Introductionmentioning

confidence: 99%

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

2007

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning

View full text Add to dashboard Cite

In this paper, finite-state, finite-action, discounted infinite-horizon-cost Markov decision processes (MDPs) with uncertain stationary transition matrices are discussed in the deterministic policy space. Uncertain stationary parametric transition matrices are clearly classified into independent and correlated cases. It is pointed out in this paper that the optimality criterion of uniform minimization of the maximum expected total discounted cost functions for all initial states, or robust uniform optimality criterion, is not appropriate for solving MDPs with correlated transition matrices. A new optimality criterion of minimizing the maximum quadratic total value function is proposed which includes the previous criterion as a special case. Based on the new optimality criterion, robust policy iteration is developed to compute an optimal policy in the deterministic stationary policy space. Under some assumptions, the solution is guaranteed to be optimal or near-optimal in the deterministic policy space.

show abstract

Markov decision processes with uncertain transition rates: sensitivity and robust control

Cited by 18 publications

References 6 publications

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Solving partially observable problems with inaccurate PSR models

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

Contact Info

Product

Resources

About