Abstract. This paper considers discrete-time Markov control processes on Borel spaces, with possibly unbounded costs, and the long run average cost (AC) criterion. Under appropriate hypotheses on weighted norms for the cost function and the transition law, the existence of solutions to the average cost optimality inequality and the average cost optimality equation are shown, which in turn yield the existence of AC-optimal and AC-canonical policies respectively.1. Introduction. Among the several approaches to prove the existence of average cost optimal (hereafter abbreviated AC-optimal ) policies for Markov control processes (MCPs) two of the most widely used are the so-called vanishing discount approach, and the one based on strong ergodicity assumptions. In the former, the idea is to impose conditions on an associated β-discounted cost problem in such a way that as β ↑ 1 we obtain in the limit either the average cost optimality inequality (ACOI) or the average cost optimality equation (ACOE), each of which in turn yields an AC-optimal policy (see e.g. [1,7,8,9,16,24]). On the other hand, imposing strong ergodicity assumptions usually allows one to obtain directly the ACOE; this approach, however, has been mainly used for MCPs with bounded cost functions [1,3,6,10].In this paper we combine the two approaches to obtain the ACOI and 1991 Mathematics Subject Classification: 93E20, 90C40.
Abstract. This paper shows the convergence of the value iteration (or successive approximations) algorithm for average cost (AC) Markov control processes on Borel spaces, with possibly unbounded cost, under appropriate hypotheses on weighted norms for the cost function and the transition law. It is also shown that the aforementioned convergence implies strong forms of AC-optimality and the existence of forecast horizons.
The paper deals with a class of discrete-time Markov control processes with Borel state and action spaces, and possibly unbounded one-stage costs. The processes are given by recurrent equations x t1 F x t Y a t Y x t , t 1Y 2Y F F F with i.i.d. R k ± valued random vectors x t whose density r is unknown. Assuming observability of x t , and taking advantage of the procedure of statistical estimation of r used in a previous work by authors, we construct an average cost optimal adaptive policy.
We study the stability of an optimal control of general Markov chains under perturbations of the transition probabilities. The criterion of optimality is the expected total discounted cost with a random rate of discounting. We give upper bounds for the stability index which are expressed in terms of the weighted total variation distance between the transition probabilities of an original and a perturbed processes. In addition, we show how the inequalities found can be applied to estimate the robustness of the optimal control for some controlled queues and for a certain consumption-investment process.
Mathematics Subject Classification: 90C40, 90C31
One usually defines an adaptive strategy to be a strategy of controlling by a random process with incompletely known probability description. Mostly, only the class of random processes to which the controlled process under examination belongs is indicated. The probability characteristics within the class are unknown but can be reconstructed by statistical methods by observation of a realization of the process. Different variants of problems of adaptive control and methods for solving them have been presented in the books [1], [2]. A large number of works have been devoted to adaptive control of Markov chains with finite sets of states and controls. Adaptive strategies for processes whose spaces of states and controls are compact sets in R have been studied in [3], [4] and in certain other works. In this article a class of controlled Markov processes with discrete time defined by the recurrent relation (1)x, (x,_, at, ), t= 1,2,. , is examined. Here the unknown "parameter" is the probability distribution of the random vectors :t. The goal of the control is to maximize the average "revenue" per unit time. For constructing optimal adaptive strategies, controls are used that are close to optimal ones in an ancillary model with a known probability description. This description is given by means of an empirical distribution constructed from realizations of ,..., so,. The results contained herein were published without proof in [5].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.