There has been a great deal of research recently on dynamic programming methods that replace the optimal cost-togo function with a suitable approximation. These methods are collectively known as neuro-dynamic programming or reinforcement learning, and are described in a number of sources, including the books by Bertsekas and Tsitsiklis (1996) and Sutton and Barto (1988). In this paper, we provide an overview of the major conceptual issues, and we survey a number of recent developments, including rollout algorithms which are related to recent advances in model predictive control for chemical processes.
This paper shows, by means of an operator called a splitting operator, that the Douglas-Rachford splitting method for finding a zero of the sum of two monotone operators is a special case of the proximal point algorithm, Therefore, applications of Douglas-Rachford splitting, such as the alternating direction method of multipliers for convex programming decomposition, are also special cases of the proximal point algorithm. This observation allows the unification and generalization of a variety of convex programming algorithms. By introducing a modified version of the proximal point algorithm, we derive a new, generalized alternating direction method of multipliers for convex programming. Advances of this sort illustrate the power and generality gained by adopting monotone operator theory as a conceptual framework.
We present a model for asynchronous distributed computation and then proceed to analyze the convergence of natural asynchronous distributed versions of a large class of deterministic and stochastic gradient-like algorithms. We show that such algorithms retain the desirable convergence properties of their centralized counterparts, provided that the time between consecutive communications between processors and communication delays are not too large.
We propose a class of algorithms for finding an optimal quasistatic routing in a communication network. The algorithms are based on Gallager's method [1] and provide methods for iteratively updating the routing table entries of each node in a manner that guarantees convergence to a minimum delay routing. Their main feature is that they utilize second derivatives of the objective function and may be viewed as approximations to a constrained version of Newton's method. The use of second derivatives results in improved speed of convergence and automatic stepsize scaling with respect to level of traffic input. These advantages are of crucial importance for the practical implementation of the algorithm using distributed computation in an environment where input traffic statistics gradually change.
We consider discrete-time infinite horizon deterministic optimal control problems with nonnegative cost per stage, and a destination that is cost-free and absorbing. The classical linear-quadratic regulator problem is a special case. Our assumptions are very general, and allow the possibility that the optimal policy may not be stabilizing the system, e.g., may not reach the destination either asymptotically or in a finite number of steps. We introduce a new unifying notion of stable feedback policy, based on perturbation of the cost per stage, which in addition to implying convergence of the generated states to the destination, quantifies the speed of convergence. We consider the properties of two distinct cost functions: J * , the overall optimal, andĴ, the restricted optimal over just the stable policies. Different classes of stable policies (with different speeds of convergence) may yield different values ofĴ. We show that for any class of stable policies,Ĵ is a solution of Bellman's equation, and we characterize the smallest and the largest solutions: they are J * , and J + , the restricted optimal cost function over the class of (finitely) terminating policies. We also characterize the regions of convergence of various modified versions of value and policy iteration algorithms, as substitutes for the standard algorithms, which may not work in general.destination, and is cost-free and absorbing:Our terminology aims to emphasize the connection with classical problems of control where X and U are the finite-dimensional Euclidean spaces X = ℜ n , U = ℜ m , and the destination is identified with the origin of ℜ n . There the essence of the problem is to reach or asymptotically approach the origin at minimum cost. A special case is the classical infinite horizon linear-quadratic regulator problem. However, our formulation also includes shortest path problems with continuous as well as discrete spaces; for example the classical shortest path problem, where X consists of the nodes of a directed graph, and the problem is to reach the destination from every other node with a minimum length path.We are interested in feedback policies of the form π = {µ 0 , µ 1 , . . .}, where each µ k is a function mapping x ∈ X into the control µ k (x) ∈ U (x). The set of all policies is denoted by Π. Policies of the form π = {µ, µ, . . .} are called stationary, and will be denoted by µ, when confusion cannot arise.Given an initial state x 0 , a policy π = {µ 0 , µ 1 , . . .} when applied to the system (1.1), generates a unique sequence of state-control pairs x k , µ k (x k ) , k = 0, 1, . . . , with cost[the series converges to some number in [0, ∞] thanks to the nonnegativity assumption (1.2)]. We view J π as a function over X, and we refer to it as the cost function of π. For a stationary policy µ, the corresponding cost function is denoted by J µ . The optimal cost function is defined asand a policy π * is said to be optimal if J π * (x) = J * (x) for all x ∈ X. The optimal cost J * (x) is identical to the optimal cost attained wh...
We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to perform the subgradient iteration incrementally, by sequentially taking steps along the subgradients of the component functions, with intermediate adjustment of the variables after processing each component function. This incremental approach has been very successful in solving large differentiable least squares problems, such as those arising in the training of neural networks, and it has resulted in a much better practical rate of convergence than the steepest descent method. In this paper, we establish the convergence properties of a number of variants of incremental subgradient methods, including some that are stochastic. Based on the analysis and computational experiments, the methods appear very promising and effective for important classes of large problems. A particularly interesting discovery is that by randomizing the order of selection of component functions for iteration, the convergence rate is substantially improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.