Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Nilim, Arnab; Ghaoui, Laurent El

doi:10.1287/opre.1050.0216

Cited by 613 publications

(737 citation statements)

References 20 publications

Supporting

Mentioning

688

Contrasting

Unclassified

Order By: Relevance

“…In Section 4 we describe three families of sets of conditional measures that are based on the confidence regions, and show that the computational effort required to solve the robust DP corresponding to these sets is only modestly higher than that required to solve the nonrobust counterpart. The results in this section, although independently obtained, are not new and were first obtained by Nilim and El Ghaoui (2002). In Section 5 we provide basic examples and computational results.…”

Section: Introductionmentioning

confidence: 87%

“…In current practice, these errors are ignored and the optimal policy is computed assuming that the estimate is, indeed, the true transition probability. The DP optimal policy is quite sensitive to perturbations in the transition probability and ignoring the estimation errors can lead to serious degradation in performance (Nilim and El Ghaoui, 2002;Tsitsiklis et al, 2002). Degradation in performance due to estimation errors in parameters has also been observed in other contexts (Ben-Tal and Nemirovski, 1997;Goldfarb and Iyengar, 2003).…”

Section: Introductionmentioning

confidence: 99%

“…While this paper was being prepared for publication we became aware of a technical report by Nilim and El Ghaoui (2002) where they formulate finite horizon robust DP in the context of an aircraft routing problem. A "robust counterpart" for the Bellman equation appears in their paper but they do not justify that this "robust counterpart", indeed, characterizes the robust value function.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Robust Dynamic Programming

2005

View full text Add to dashboard Cite

In this paper we propose a robust formulation for discrete time dynamic programming (DP). The objective of the robust formulation is to systematically mitigate the sensitivity of the DP optimal policy to ambiguity in the underlying transition probabilities. The ambiguity is modeled by associating a set of conditional measures with each state-action pair. Consequently, in the robust formulation each policy has a set of measures associated with it. We prove that when this set of measures has a certain "Rectangularity" property all the main results for finite and infinite horizon DP extend to natural robust counterparts. We identify families of sets of conditional measures for which the computational complexity of solving the robust DP is only modestly larger than solving the DP, typically logarithmic in the size of the state space. These families of sets are constructed from the confidence regions associated with density estimation, and therefore, can be chosen to guarantee any desired level of confidence in the robust optimal policy. Moreover, the sets can be easily parameterized from historical data. We contrast the performance of robust and non-robust DP on small numerical examples.

show abstract

Section: Introductionmentioning

confidence: 87%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Robust Dynamic Programming

2005

View full text Add to dashboard Cite

show abstract

“…The robust optimal policy for an uncertain MDP can be computed using various methods [10], [16], [15].…”

Section: A Two-step Solution For Umdpmentioning

confidence: 99%

“…However, none of these methods is robust in the presence of modeling uncertainty. On the other hand, for MDPs with uncertain parameters, robust MDPs have been extensively studied [10], [8], [15]. Recently, robust control of MDPs has been extended to handle expressive temporal logic constraints [16], [3], [14].…”

Section: Introductionmentioning

confidence: 99%

Robust optimal policies for Markov decision processes with safety-threshold constraints

Dimitrova

Topcu

2016

2016 IEEE 55th Conference on Decision and Control (CDC)

View full text Add to dashboard Cite

Abstract-We study the synthesis of robust optimal control policies for Markov decision processes with transition uncertainty (UMDPs) and subject to two types of constraints: (i) constraints on the worst-case, maximal total cost and (ii) safetythreshold constraints that bound the worst-case probability of visiting a set of error states. For maximal total cost constraints, we propose a state-augmentation method and a twostep synthesis algorithm to generate deterministic, memoryless optimal policies given the reward to be maximized. For safety threshold constraints, we introduce a new cost function and provide an approximately optimal solution by a reduction to an uncertain Markov decision process under a maximal total cost constraint. The safety-threshold constraints require memory and randomization for optimality. We discuss the use and the limitations of the proposed solution.

show abstract

Sensitivity Analysis and Dynamic Programming

Tan

Hartman

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

The validity of any dynamic programming solution depends on the accuracy of the model and the data for a given instance. However, the parameters of the model are often uncertain and estimated in practice. In this paper, we discuss various concepts of sensitivity analysis and survey the work on sensitivity analysis in the dynamic programming literature. In addition, we highlight the relationship between dynamic programming and linear programming and how the approaches and results from the latter can be applied to dynamic programs. A list of promising areas of future research is provided.

show abstract

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Cited by 613 publications

References 20 publications

Robust Dynamic Programming

Robust Dynamic Programming

Robust optimal policies for Markov decision processes with safety-threshold constraints

Sensitivity Analysis and Dynamic Programming

Contact Info

Product

Resources

About