Simulation-based Algorithms for Markov Decision Processes

Chang, Hyeong Soo; Hu, Jiaqiao; Fu, Michael C.; Marcus, Solomon

doi:10.1007/978-1-84628-690-2

Cited by 168 publications

(93 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even if the spirit of improving all policies in a set is similar to parallel rollout in unconstrained MDPs (Chang, Hu, Fu, & Marcus, 2013), the approach here is different due to the constrained setting. The parallel rollout method for a set of policies uses the maximum value function over the value functions of the policies in the set at each possible next state when it obtains an action prescribed by an improving policy at a state.…”

Section: Multi-policy Improvementmentioning

confidence: 98%

Random search for constrained Markov decision processes with multi-policy improvement

Chang

2015

Automatica

Self Cite

View full text Add to dashboard Cite

Section: Multi-policy Improvementmentioning

confidence: 98%

Random search for constrained Markov decision processes with multi-policy improvement

Chang

2015

Automatica

Self Cite

View full text Add to dashboard Cite

“…Reference [37] gives a survey of EC applied to noisy environments. Recent work by [13] and [12] has produced provably convergent algorithms for solving Markov Decision Processes. Reference [32] extend this work to solving problems with the form of (1).…”

Section: Stochastic Combinatorial Optimization Literaturementioning

confidence: 99%

“…Evolutionary Policy Iteration (EPI) was proposed in [13] and [12]. It was suggested as a method to find optimal policies in Markov decision processes (MDP).…”

Section: Competing Algorithmsmentioning

confidence: 99%

“…Reference [12] notes that E C(ρ 1 (x)) could be computed via simulation, but does not offer a sampling routine or convergence results. Since E C(ρ 1 (x)) can only be approximated via Monte Carlo simulations in this application, this section will provide a modification of Algorithm 2 to handle this situation and provides a proof of convergence for the resulting algorithm.…”

mentioning

confidence: 99%

“…Both proposals of EPI [12,13] assume that E C(ρ 1 (x)) can be computed without error for every portfolio x. Reference [12] notes that E C(ρ 1 (x)) could be computed via simulation, but does not offer a sampling routine or convergence results.…”

mentioning

confidence: 99%

See 2 more Smart Citations

One-stage R&D portfolio optimization with an application to solid oxide fuel cells

2010

View full text Add to dashboard Cite

This paper provides an overview of the one-stage R&D portfolio optimization problem. It provides a novel problem model that can be solved with stochastic combinatorial optimization methods. Current solution methods are reviewed and a new method that scales to large problems, Stochastic Gradient Portfolio Optimization (SGPO), is proposed. Although SGPO is a heuristic method, we prove global convergence in certain conditions. SGPO is numerically compared to current optimization methods on a test case involving Solid Oxide Fuel Cells.

show abstract

Approximate Dynamic Programming I: Modeling

Powell

2011

Wiley Encyclopedia of Operations Research and Management Science

View full text Add to dashboard Cite

The first step in solving a stochastic optimization problem is providing a mathematical model. How the problem is modeled can impact the solution strategy. In this article, we provide a flexible modeling framework that uses a classic control‐theoretic framework, avoiding devices such as one‐step transition matrices. We describe the five fundamental elements of any stochastic, dynamic program. Different notational conventions are introduced, and the types of policies that can be used to guide decisions are described in detail. This discussion puts approximate dynamic programming in the context of a variety of other algorithmic strategies by using the modeling framework to describe a wide range of policies. A brief discussion of model‐free programming is also provided.

show abstract

Simulation-based Algorithms for Markov Decision Processes

Cited by 168 publications

References 0 publications

Random search for constrained Markov decision processes with multi-policy improvement

Random search for constrained Markov decision processes with multi-policy improvement

One-stage R&D portfolio optimization with an application to solid oxide fuel cells

Approximate Dynamic Programming I: Modeling

Contact Info

Product

Resources

About