Markov Decision Processes

Whittle, Peter; Puterman, Martin L.

doi:10.1002/9780470316887

Cited by 5,046 publications

(1,170 citation statements)

References 2 publications

(2 reference statements)

Supporting

Mentioning

1,148

Contrasting

Unclassified

Order By: Relevance

“…Under mild conditions [1], the algorithm can be shown to converge to the solution of Equation (3), thus yielding both the optimal policy * π and its associated average cost *  .…”

Section: The Value Iteration Algorithmmentioning

confidence: 99%

“…Under general conditions [1], each control policy π   implies a finite long term cost π  . The task of the decision maker is to identify a policy π   that minimizes the long term average cost, thus satisfying the expression below:…”

Section: Average Cost Markov Decision Processesmentioning

confidence: 99%

“…For more details on the convergence of VI algorithms for average cost MDPs, we refer to [1]. The unknown rate of convergence renders the results in [13] not directly applicable for the studied problem.…”

Section: Introductionmentioning

confidence: 99%

“…An elegant way to find the optimal control actions for each state is provided by the classical value or policy iteration algorithms [1][2][3][4][5][6][7][8][9][10][11]. The value iteration (VI) algorithm is arguably the most popular algorithm, in part because of its simplicity and ease of implementation.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Arruda¹,

Ourique²

2013

AJOR

View full text Add to dashboard Cite

This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view to saving computations at the early iterations, when one is typically far from the optimal solution. The proposed algorithm is compared to classical value iteration for a broad set of adaptive parameters and the results suggest that significant computational savings can be obtained, while also ensuring a robust performance with respect to the parameters.

show abstract

“…Under mild conditions [1], the algorithm can be shown to converge to the solution of Equation (3), thus yielding both the optimal policy * π and its associated average cost *  .…”

Section: The Value Iteration Algorithmmentioning

confidence: 99%

Section: Average Cost Markov Decision Processesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Arruda¹,

Ourique²

2013

AJOR

View full text Add to dashboard Cite

show abstract

“…Moreover, under the axiom of non-satiation, the consumer will spend all his wealth in the last period of his life span and therefore 1 0 T W + = . The problem (2)- (3) is a discrete-time stochastic control problem (see [10,11]). Now consider the case when the consumer has lived to period t and his wealth is W .…”

Section: Utility Maximization Under Random Life Span and Uncertain Inmentioning

confidence: 99%

Optimal Consumption under Uncertainties: Random Horizon Stochastic Dynamic Roy’s Identity and Slutsky Equation

Yeung¹

2014

View full text Add to dashboard Cite

This paper extends Slutsky's classic work on consumer theory to a random horizon stochastic dynamic framework in which the consumer has an inter-temporal planning horizon with uncertainties in future incomes and life span. Utility maximization leading to a set of ordinary wealth-dependent demand functions is performed. A dual problem is set up to derive the wealth compensated demand functions. This represents the first time that wealth-dependent ordinary demand functions and wealth compensated demand functions are obtained under these uncertainties. The corresponding Roy's identity relationships and a set of random horizon stochastic dynamic Slutsky equations are then derived. The extension incorporates realistic characteristics in consumer theory and advances the conventional microeconomic study on consumption to a more realistic optimal control framework.

show abstract

Experimental optimization of a real time fed-batch fermentation process using Markov decision process

Saucedo

Karim²

1997

Biotechnol. Bioeng.

View full text Add to dashboard Cite

Markov Decision Processes

Cited by 5,046 publications

References 2 publications

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Optimal Consumption under Uncertainties: Random Horizon Stochastic Dynamic Roy’s Identity and Slutsky Equation

Experimental optimization of a real time fed-batch fermentation process using Markov decision process

Contact Info

Product

Resources

About