Abstract-We describe a generalization of the Differential Dynamic Programming trajectory optimization algorithm which accommodates box inequality constraints on the controls, without significantly sacrificing convergence quality or computational effort. To this effect we describe an efficient Quadratic Programming sub-algorithm which benefits from warm starts and provides explicit Hessian factors. We demonstrate our algorithm on three simulated problems, including a 28-DoF grasping problem. Simple cost terms were sufficient to generate highly dexterous and agile grasping behaviors. A movie of the grasping results can be found here goo.gl/GlM8h I. INTRODUCTIONConstraints on the control signal applied to an actuator are invariably present in robotic systems. Signal clamping alleviates the danger of frying one's robot by exceeding voltage limits, but will not help in finding the best signal given those limits.Optimal control algorithms render such control constraints as inequality-constrained optimization problems, which are always harder than unconstrained ones. Classic Differential Dynamic Programming (DDP) -a trajectory-optimizeris efficient precisely because it parameterizes unconstrained controls. Below we describe a generalization of DDP which accommodates box inequality constraints on the controls, without significantly sacrificing convergence quality or computational effort.In section II we provide background on DDP and box constraints. In III we motivate and describe our proposed algorithm. In IV we describe experimental results obtained in simulation. II. BACKGROUNDTrajectory optimization is the process of finding a statecontrol sequence which locally minimizes a given cost function. Shooting methods -which trace their ancestry to the two-point boundary-value problem of the venerable Maximum Principle [1] -are an important sub-class of trajectory optimization methods. Unlike so-called direct methods which explicitly represent the state, these methods parameterize only the controls, and obtain the states from forward integration (hence "shooting"). Given the state-control trajectory, Dynamic Programming is used to find an improved control sequence. Because states are never explicitly represented in the optimization space, these methods are also known as indirect [2].Because the dynamics are folded into the optimization, state-control trajectories are always feasible and "dynamic constraints" unnecessary. If additionally the controls are unconstrained, so is the optimization search-space, and shooting methods can enjoy the benefits of unconstrained optimization.DDP is a second-order shooting method [3] which under mild assumptions admits quadratic convergence for any system with smooth dynamics [4]. It has been shown to posses convergence properties similar to or slightly better than Newton's method performed on the entire control sequence [5].Classic DDP requires second order derivatives of the dynamics, which are usually the most expensive part of the computation. If these are ignored one obtains a Gauss-...
Abstract-Neural networks have recently solved many hard problems in Machine Learning, but their impact in control remains limited. Trajectory optimization has recently solved many hard problems in robotic control, but using it online remains challenging. Here we leverage the high-fidelity solutions obtained by trajectory optimization to speed up the training of neural network controllers. The two learning problems are coupled using the Alternating Direction Method of Multipliers (ADMM). This coupling enables the trajectory optimizer to act as a teacher, gradually guiding the network towards better solutions. We develop a new trajectory optimizer based on inverse contact dynamics, and provide not only the trajectories but also the feedback gains as training data to the network. The method is illustrated on rolling, reaching, swimming and walking tasks.
Abstract-We present a generalization of the classic Differential Dynamic Programming algorithm. We assume the existence of state-and control-dependent process noise, and proceed to derive the second-order expansion of the cost-to-go. We find the correction terms that arise from the stochastic assumption. Despite having quartic and cubic terms in the initial expression, we show that these vanish, leaving us with the same quadratic structure as standard DDP.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.