Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use due to their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a pre-specified probability 1 − β. Afterwards, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1 − β. Our method involves the solution of tractable conic programs of moderate size.Keywords Robust Optimization; Markov Decision Processes; Semidefinite Programming.Notation For a finite set X = {1, . . . , X}, M(X ) denotes the probability simplex in RX . An X -valued random variable χ has distribution m ∈ M(X ), denoted by χ ∼ m, if P(χ = x) = m x for all x ∈ X . By default, all vectors are column vectors. We denote by e k the kth canonical basis vector, while e denotes the vector whose components are all ones. In both cases, the dimension will usually be clear from the context. For square matrices A and B, the relation A B indicates that the matrix A − B is positive semidefinite. We denote the space of symmetric n × n matrices by S n . The declaration f :implies that f is a continuous (affine) function from X to Y . For a matrix A, we denote its ith row by A i· (a row vector) and its jth column by A ·j .