“…Under mild conditions [1], the algorithm can be shown to converge to the solution of Equation (3), thus yielding both the optimal policy * π and its associated average cost * .…”
Section: The Value Iteration Algorithmmentioning
confidence: 99%
“…Under general conditions [1], each control policy π implies a finite long term cost π . The task of the decision maker is to identify a policy π that minimizes the long term average cost, thus satisfying the expression below:…”
Section: Average Cost Markov Decision Processesmentioning
confidence: 99%
“…For more details on the convergence of VI algorithms for average cost MDPs, we refer to [1]. The unknown rate of convergence renders the results in [13] not directly applicable for the studied problem.…”
Section: Introductionmentioning
confidence: 99%
“…An elegant way to find the optimal control actions for each state is provided by the classical value or policy iteration algorithms [1][2][3][4][5][6][7][8][9][10][11]. The value iteration (VI) algorithm is arguably the most popular algorithm, in part because of its simplicity and ease of implementation.…”
This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view to saving computations at the early iterations, when one is typically far from the optimal solution. The proposed algorithm is compared to classical value iteration for a broad set of adaptive parameters and the results suggest that significant computational savings can be obtained, while also ensuring a robust performance with respect to the parameters.
“…Under mild conditions [1], the algorithm can be shown to converge to the solution of Equation (3), thus yielding both the optimal policy * π and its associated average cost * .…”
Section: The Value Iteration Algorithmmentioning
confidence: 99%
“…Under general conditions [1], each control policy π implies a finite long term cost π . The task of the decision maker is to identify a policy π that minimizes the long term average cost, thus satisfying the expression below:…”
Section: Average Cost Markov Decision Processesmentioning
confidence: 99%
“…For more details on the convergence of VI algorithms for average cost MDPs, we refer to [1]. The unknown rate of convergence renders the results in [13] not directly applicable for the studied problem.…”
Section: Introductionmentioning
confidence: 99%
“…An elegant way to find the optimal control actions for each state is provided by the classical value or policy iteration algorithms [1][2][3][4][5][6][7][8][9][10][11]. The value iteration (VI) algorithm is arguably the most popular algorithm, in part because of its simplicity and ease of implementation.…”
This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view to saving computations at the early iterations, when one is typically far from the optimal solution. The proposed algorithm is compared to classical value iteration for a broad set of adaptive parameters and the results suggest that significant computational savings can be obtained, while also ensuring a robust performance with respect to the parameters.
“…Moreover, under the axiom of non-satiation, the consumer will spend all his wealth in the last period of his life span and therefore 1 0 T W + = . The problem (2)- (3) is a discrete-time stochastic control problem (see [10,11]). Now consider the case when the consumer has lived to period t and his wealth is W .…”
Section: Utility Maximization Under Random Life Span and Uncertain Inmentioning
This paper extends Slutsky's classic work on consumer theory to a random horizon stochastic dynamic framework in which the consumer has an inter-temporal planning horizon with uncertainties in future incomes and life span. Utility maximization leading to a set of ordinary wealth-dependent demand functions is performed. A dual problem is set up to derive the wealth compensated demand functions. This represents the first time that wealth-dependent ordinary demand functions and wealth compensated demand functions are obtained under these uncertainties. The corresponding Roy's identity relationships and a set of random horizon stochastic dynamic Slutsky equations are then derived. The extension incorporates realistic characteristics in consumer theory and advances the conventional microeconomic study on consumption to a more realistic optimal control framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.