A moment and sum-of-squares extension of dual dynamic programming with application to nonlinear energy storage problems

We present an approximate method for solving nonlinear control problems over long time horizons, in which the full nonlinear model is preserved over an initial part of the horizon, while the remainder of the horizon is modeled using a linear relaxation. As this approximate problem may still be too large to solve directly, we present a Benders decomposition-based solution algorithm that iterates between solving the nonlinear and linear parts of the horizon. This extends the Dual Dynamic Programming approach commonly employed for optimization of linearized hydro power systems. We prove that the proposed algorithm converges after a finite number of iterations, even when the nonlinear initial stage problems are solved inexactly. We also bound the suboptimality of the split-horizon method with respect to the original nonlinear problem, in terms of the properties of a map between the linear and nonlinear stateinput trajectories. We then apply this method to a case study concerning a multiple reservoir hydro system, approximating the nonlinear head effects in the second stage using McCormick envelopes. We demonstrate that near-optimal solutions can be obtained in a shrinking horizon setting when the full nonlinear model is used for only a short initial section of the horizon. For this example, the approach is shown to be more practical than both conventional dynamic programming and a multi-cell McCormick envelope approximation from literature.

show abstract

“…The second inequality (15) comes from substituting in the right inequality of (13) into (14). The final inequality (16) is…”

Section: Error Bound On Two-stage Approximationmentioning

confidence: 99%

“…Since DDP cannot solve problems with nonconvex value functions, convex approximations of the model are required when using DDP for nonlinear problems. Recent extensions of this approach have considered integer programs [12], locally-valid Benders cuts [13], and polynomial-based moment relaxation [14].…”

Section: Introductionmentioning

confidence: 99%

Two-Stage Dual Dynamic Programming With Application to Nonlinear Hydro Scheduling

Flamm

Eichler

Warrington

et al. 2021

IEEE Trans. Contr. Syst. Technol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Examples include the quadratic lower bound in [14], and the polynomial derived using sum-of-squares techniques in [13]. Approximate Vfunctions represented as the pointwise maximum of multiple lower-bounding functions have been used in [1], [9], [10], [15]. Recent work utilizing a point-wise maximum representation [16] has extended the Benders decomposition argument used for linear multi-stage decision problems in Dual DP (DDP, [12]), to a general nonlinear, infinite-horizon setting.…”

Section: Introductionmentioning

confidence: 99%

“…Thus the strong duality condition in Assumption 3 holds.To prove Lipschitz continuity in Assumption 3, one must bound the gradient in u-space of the functions q i (x m , ·) for any given x m . Inspection of problem(9) shows that each new function q I+1 depends on the existing functions q 0 , . .…”

mentioning

confidence: 99%

Learning continuous $Q$-functions using generalized Benders cuts

Warrington

2019

2019 18th European Control Conference (ECC)

Self Cite

View full text Add to dashboard Cite

Q-functions are widely used in discrete-time learning and control to model future costs arising from a given control policy, when the initial state and input are given. Although some of their properties are understood, Q-functions generating optimal policies for continuous problems are usually hard to compute. Even when a system model is available, optimal control is generally difficult to achieve except in rare cases where an analytical solution happens to exist, or an explicit exact solution can be computed. It is typically necessary to discretize the state and action spaces, or parameterize the Q-function with a basis that can be hard to select a priori. This paper describes a model-based algorithm based on generalized Benders theory that yields ever-tighter outer-approximations of the optimal Qfunction. Under a strong duality assumption, we prove that the algorithm yields an arbitrarily small Bellman optimality error at any finite number of arbitrary points in the stateinput space, in finite iterations. Under additional assumptions, the same guarantee holds when the inputs are determined online by the algorithm's updating Q-function. We demonstrate these properties numerically on scalar and 8-dimensional systems.

show abstract

“…This reduces the dependency of the result on the choice of the individual objectives. Parts of these techniques are used in [13] and demonstrated in simulation on a nonlinear energy storage system with a low dimensional state-by-input space. In that application, the slow time scale allows for computationally demanding online policies.…”

Section: Introductionmentioning

confidence: 99%

Nonlinear Control of Quadcopters via Approximate Dynamic Programming

Romero

Beuchat

Stürz

et al. 2019

2019 18th European Control Conference (ECC)

Self Cite

View full text Add to dashboard Cite

While Approximate Dynamic Programming has successfully been used in many applications involving discrete states and inputs such as playing the games of Tetris or chess, it has not been used in many continuous state and input space applications. In this paper, we combine Approximate Dynamic Programming techniques and apply them to the continuous, non-linear and high dimensional dynamics of a quadcopter vehicle. We use a polynomial approximation of the dynamics and sum-of-squares programming techniques to compute a family of polynomial value function approximations for different tuning parameters. The resulting approximations to the optimal value function are combined in a point-wise maximum approach, which is used to compute the online policy. The success of the method is demonstrated in both simulations and experiments on a quadcopter. The control performance is compared to a linear time-varying Model Predictive Controller. The two methods are then combined to keep the computational benefits of a short horizon MPC and the long term performance benefits of the Approximate Dynamic Programming value function as the terminal cost.

show abstract

A moment and sum-of-squares extension of dual dynamic programming with application to nonlinear energy storage problems

Cited by 12 publications

References 39 publications

Two-Stage Dual Dynamic Programming With Application to Nonlinear Hydro Scheduling

Two-Stage Dual Dynamic Programming With Application to Nonlinear Hydro Scheduling

Learning continuous $Q$-functions using generalized Benders cuts

Nonlinear Control of Quadcopters via Approximate Dynamic Programming

Contact Info

Product

Resources

About