Lyapunov-stable neural-network control

Dai, Hongkai; Benoit, Landry; Yang, Lujie; Pavone, Marco; Tedrake, Russ

doi:10.15607/rss.2021.xvii.063

Cited by 49 publications

(19 citation statements)

References 28 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We can for example restrict the policy to obey stable system dynamics derived from first principles ( Greydanus et al, 2019 ; Lutter et al, 2019 ). Another approach is to design the model class such that the closed-loop system is passive for all parameterizations of the learned policy, thus guaranteeing stability in the sense of Lyapunov as well as bounded output energy given bounded input energy ( Brogliato et al, 2007 ; Yang et al, 2013 ; Dai et al, 2021 ). All these methods would require significant exploration in the environment, making it even more challenging to learn successful controllers in the real-world directly.…”

Section: Discussion and Outlookmentioning

confidence: 99%

Robot Learning From Randomized Simulations: A Review

et al. 2022

View full text Add to dashboard Cite

The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. Unfortunately, it is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the “reality gap.” We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named “domain randomization” which is a method for learning from randomized simulations.

show abstract

Section: Discussion and Outlookmentioning

confidence: 99%

Robot Learning From Randomized Simulations: A Review

et al. 2022

View full text Add to dashboard Cite

show abstract

“…c) Learned Certifications for Model-based Controllers: Remaining methods for connecting safety and learning are to provide certification on learned stability and constraint set using existing model-based controllers. Stability criterion can be achieved by Lyapunov analysis on region of attraction (RoA) [8, 20,52] or by Lipschitz-based safety [32] during training. Safety with constraint set certifications can be learned using control synthesis such as feedback linearization controllers [68], CBFs [46,53,54,55,61,64,71], and Hamilton-Jacobi (HJ) reachability analysis [5,23,31].…”

Section: A Related Work 1) Safety and Learningmentioning

confidence: 99%

Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models

Li¹,

Zeng²,

Thirugnanam³

et al. 2022

Preprint

View full text Add to dashboard Cite

Bridging model-based safety and model-free reinforcement learning (RL) for dynamic robots is appealing since model-based methods are able to provide formal safety guarantees, while RL-based methods are able to exploit the robot agility by learning from the full-order system dynamics. However, current approaches to tackle this problem are mostly restricted to simple systems. In this paper, we propose a new method to combine model-based safety with model-free reinforcement learning by explicitly finding a low-dimensional model of the system controlled by a RL policy and applying stability and safety guarantees on that simple model. We use a complex bipedal robot Cassie, which is a high dimensional nonlinear system with hybrid dynamics and underactuation, and its RL-based walking controller as an example. We show that a low-dimensional dynamical model is sufficient to capture the dynamics of the closed-loop system. We demonstrate that this model is linear, asymptotically stable, and is decoupled across control input in all dimensions. We further exemplify that such linearity exists even when using different RL control policies. Such results point out an interesting direction to understand the relationship between RL and optimal control: whether RL tends to linearize the nonlinear system during training in some cases. Furthermore, we illustrate that the found linear model is able to provide guarantees by safety-critical optimal control framework, e.g., Model Predictive Control with Control Barrier Functions, on an example of autonomous navigation using Cassie while taking advantage of the agility provided by the RL-based controller.

show abstract

“…The quadratic Lyapunov candidate function (orange) violates the constraint in (2), since avoiding the obstacles implies moving away from the goal, while a valid Lyapunov function (blue) is monotonically decreasing along the demonstration path (red). Instead of explicitly specifying the Lyapunov candidate function, prior work [12] learns a function that satisfies the Lyapunov conditions in (1)- (3). In this work, we will adapt this structure to learn the Lyapunov function.…”

Section: A Lyapunov Functionmentioning

confidence: 99%

“…We follow the formulation in [12] to structure a Lyapunov candidate function as follows: The policy can be taken to be related to the negative gradient of the learned Lyapunov function…”

Section: B Value Function and Policy Learningmentioning

confidence: 99%

“…However, [10], [11] relies on a chosen Lyapunov candidate function to be valid for the demonstrations. Another prior work constrains the Lyapunov stability conditions on neural networks and learns the Lyapunov function with the policy from training data using mixed-integer programming [12]. On the Reinforcement Learning (RL) front, prior work introduces the stability constraint during training by widening the region of attraction of a Lyapunov function [13].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Generating Stable and Collision-Free Policies through Lyapunov Function Learning

Coulombe¹,

Lin²

2022

Preprint

View full text Add to dashboard Cite

The need for rapid and reliable robot deployment is on the rise. Imitation Learning (IL) has become popular for producing motion planning policies from a set of demonstrations. However, many methods in IL are not guaranteed to produce stable policies. The generated policy may not converge to the robot target, reducing reliability, and may collide with its environment, reducing the safety of the system. Stable Estimator of Dynamic Systems (SEDS) produces stable policies by constraining the Lyapunov stability criteria during learning, but the Lyapunov candidate function had to be manually selected. In this work, we propose a novel method for learning a Lyapunov function and a policy using a single neural network model. The method can be equipped with an obstacle avoidance module for convex object pairs to guarantee no collisions. We demonstrated our method is capable of finding policies in several simulation environments and transfer to a real-world scenario.

show abstract

Lyapunov-stable neural-network control

Cited by 49 publications

References 28 publications

Robot Learning From Randomized Simulations: A Review

Robot Learning From Randomized Simulations: A Review

Bridging Model-based Safety and Model-free Reinforcement Learning through System Identification of Low Dimensional Linear Models

Generating Stable and Collision-Free Policies through Lyapunov Function Learning

Contact Info

Product

Resources

About