<b>Feature Article</b>—Merging AI and OR to Solve High-Dimensional Stochastic Optimization Problems Using Approximate Dynamic Programming

Powell, Warren B.

doi:10.1287/ijoc.1090.0349

Cited by 38 publications

(13 citation statements)

References 38 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These two tasks are investigated experimentally in Section 5 below, using the primary and secondary intuitions just described. The second variety of tasks includes resource management scenarios, where stocks and flows must be controlled in order to maximize profits and minimize costs [13]. In these scenarios, aspects of the estimated state can be the levels of various resources, their prices, environmental conditions such as weather, and so on.…”

Section: Q(s A) = Q(ŝ) = M J=1mentioning

confidence: 99%

Policy Iteration Based on a Learned Transition Model

Ramavajjala

Elkan

2012

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. This paper investigates a reinforcement learning method that combines learning a model of the environment with least-squares policy iteration (LSPI). The LSPI algorithm learns a linear approximation of the optimal stateaction value function; the idea studied here is to let this value function depend on a learned estimate of the expected next state instead of directly on the current state and action. This approach makes it easier to define useful basis functions, and hence to learn a useful linear approximation of the value function. Experiments show that the new algorithm, called NSPI for next-state policy iteration, performs well on two standard benchmarks, the well-known mountain car and inverted pendulum swing-up tasks. More importantly, the NSPI algorithm performs well, and better than a specialized recent method, on a resource management task known as the day-ahead wind commitment problem. This latter task has action and state spaces that are high-dimensional and continuous.

show abstract

Section: Q(s A) = Q(ŝ) = M J=1mentioning

confidence: 99%

Policy Iteration Based on a Learned Transition Model

Ramavajjala

Elkan

2012

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

show abstract

“…In operations research (and I believe that this is often true in AI), the state is generally viewed as a description of a snapshot of the "system," which might be the location of a vehicle, the amount in inventory, or the trajectory of a helicopter. In my single-trucker example in the paper in §3 (Powell 2010), it would include not only the position of the truck but also the additional information about loads to be moved. This illustrates that the state variable has to cover all the information you need to make a decision, a concept that appears to be well defined and understood in the systems and controls community.…”

Section: Commentmentioning

confidence: 99%

Rejoinder—The Languages of Stochastic Optimization

Powell

2010

INFORMS Journal on Computing

Self Cite

View full text Add to dashboard Cite

“…In reinforcement learning, most action spaces are finite and small, so little attention has been given to maintaining convexity of Q(S, a) in a when the action space is continuous, vector-valued and convex. However, vector-valued action spaces can be handled by retaining convexity along with the use of the post-decision state variable, as illustrated in [53].…”

Section: Introduction Stochastic Search Seeks To Find a Set Of Contrmentioning

confidence: 99%

Semiconvex Regression for Metamodeling-Based Optimization

Hannah¹,

Powell²,

Dunson³

2014

SIAM J. Optim.

Self Cite

View full text Add to dashboard Cite

Stochastic search involves finding a set of controllable parameters that minimizes an unknown objective function using a set of noisy observations. We consider the case when the unknown function is convex and a metamodel is used as a surrogate objective function. Often the data are non-i.i.d. and include a observable state variable, such as applicant information in a loan rate decision problem. State information is difficult to incorporate into convex models. We propose a new semi-convex regression method that is used to produce a convex metamodel in the presence of a state variable. We show consistency for this method. We demonstrate its effectiveness for metamodeling on a set of synthetic inventory management problems and a large, real-life auto loan dataset.

show abstract

Feature Article—Merging AI and OR to Solve High-Dimensional Stochastic Optimization Problems Using Approximate Dynamic Programming

Cited by 38 publications

References 38 publications

Policy Iteration Based on a Learned Transition Model

Policy Iteration Based on a Learned Transition Model

Rejoinder—The Languages of Stochastic Optimization

Semiconvex Regression for Metamodeling-Based Optimization

Contact Info

Product

Resources

About