Point-wise maximum approach to approximate dynamic programming

We describe a nonlinear generalization of dual dynamic programming theory and its application to value function estimation for deterministic control problems over continuous state and action spaces, in a discrete-time infinite horizon setting. We prove, using a Benders-type argument leveraging the monotonicity of the Bellman operator, that the result of a one-stage policy evaluation can be used to produce nonlinear lower bounds on the optimal value function that are valid over the entire state space. These bounds contain terms reflecting the functional form of the system's costs, dynamics, and constraints. We provide an iterative algorithm that produces successively better approximations of the optimal value function, and prove under certain assumptions that it achieves an arbitrarily low desired Bellman optimality tolerance at pre-selected points in the state space, in a finite number of iterations. We also describe means of certifying the quality of the approximate value function generated. We demonstrate the efficacy of the approach on systems whose dimensions are too large for conventional dynamic programming approaches to be practical.

show abstract

“…The dynamics f (x, u) take one of the following input-affine forms: (a) If in (7) we have R kl 0 for each index kl, then…”

Section: Restriction Of Problem Classmentioning

confidence: 99%

Generalized Dual Dynamic Programming for Infinite Horizon Problems in Continuous State and Action Spaces

Warrington

Beuchat

Lygeros

2019

IEEE Trans. Automat. Contr.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The benefit of a pointwise maximum combination is empirically demonstrated in [20] for a simple example, with the set of state-relevance weighting parameters hand-picked using problem-specific insight. In our previous work [27], we proposed a problem formulation with the point-wise maximum combination used in the Bellman inequality. The formulation was used to develop an iterative algorithm for computing lower bounding approximate value functions, however, the quality of the approximation, comparable with that of [20], still relies on the designer choosing a sequence of state-relevance weightings.…”

Section: B Prior Workmentioning

confidence: 99%

“…We propose using a gradient ascent algorithm to address the non-convex point-wise maximum objective, and combine this with the algorithm proposed in [27] for computing a family of approximate value function whose point-wise maximum combination satisfies the Bellman inequality. The benefits of gradient ascent in this setting are two fold: 1) At each iteration of the gradient ascent algorithm the objective function is linear in the coefficients of the approximate value function and hence the computation requirements are comparable with existing methods; 2) The computation of a gradient direction has the interpretation of reducing the support of the state-relevance weighting distribution to a region of the state space that is relevant for the current iteration.…”

Section: Contributions and Outlinementioning

confidence: 99%

Accelerated Point-Wise Maximum Approach to Approximate Dynamic Programming

Beuchat

Warrington

Lygeros

2022

IEEE Trans. Automat. Contr.

Self Cite

View full text Add to dashboard Cite

We describe an approximate dynamic programming approach to compute lower bounds on the optimal value function for a discrete time, continuous space, infinite horizon setting. The approach iteratively constructs a family of lower bounding approximate value functions by using the so-called Bellman inequality. The novelty of our approach is that, at each iteration, we aim to compute an approximate value function that maximizes the point-wise maximum taken with the family of approximate value functions computed thus far. This leads to a non-convex objective, and we propose a gradient ascent algorithm to find stationary points by solving a sequence of convex optimization problems. We provide convergence guarantees for our algorithm and an interpretation for how the gradient computation relates to the state-relevance weighting parameter appearing in related approximate dynamic programming approaches. We demonstrate through numerical examples that, when compared to existing approaches, the algorithm we propose computes tighter suboptimality bounds with comparable computation time.

show abstract

“…To improve the quality of the approximate value function, we use the approach proposed in [11] that solves a sequence of optimization problems, each with constraints of the same size as (9).…”

Section: Point-wise Maximum Approach To Adpmentioning

confidence: 99%

“…The steps given in [11] show how to reformulate (10c) as a polynomial inequality constraint similar to (7). The SOS S-Procedure is then applied and the resulting relaxation involves one LMI constraint with the same size as (9a), and j − 1 LMI constraints identical to (9b).…”

Section: Point-wise Maximum Approach To Adpmentioning

confidence: 99%

Nonlinear Control of Quadcopters via Approximate Dynamic Programming

Romero

Beuchat

Stürz

et al. 2019

2019 18th European Control Conference (ECC)

Self Cite

View full text Add to dashboard Cite

While Approximate Dynamic Programming has successfully been used in many applications involving discrete states and inputs such as playing the games of Tetris or chess, it has not been used in many continuous state and input space applications. In this paper, we combine Approximate Dynamic Programming techniques and apply them to the continuous, non-linear and high dimensional dynamics of a quadcopter vehicle. We use a polynomial approximation of the dynamics and sum-of-squares programming techniques to compute a family of polynomial value function approximations for different tuning parameters. The resulting approximations to the optimal value function are combined in a point-wise maximum approach, which is used to compute the online policy. The success of the method is demonstrated in both simulations and experiments on a quadcopter. The control performance is compared to a linear time-varying Model Predictive Controller. The two methods are then combined to keep the computational benefits of a short horizon MPC and the long term performance benefits of the Approximate Dynamic Programming value function as the terminal cost.

show abstract

Point-wise maximum approach to approximate dynamic programming

Cited by 6 publications

References 32 publications

Generalized Dual Dynamic Programming for Infinite Horizon Problems in Continuous State and Action Spaces

Generalized Dual Dynamic Programming for Infinite Horizon Problems in Continuous State and Action Spaces

Accelerated Point-Wise Maximum Approach to Approximate Dynamic Programming

Nonlinear Control of Quadcopters via Approximate Dynamic Programming

Contact Info

Product

Resources

About