Safe Model-Based Reinforcement Learning for Systems With Parametric Uncertainties

Mahmud, Shakil; Nivison, Scott; Bell, Zachary I.; Kamalapurkar, Rushikesh

doi:10.3389/frobt.2021.733104

Cited by 9 publications

(9 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, significant progress has been made in developing safe model-based reinforcement learning (SMBRL) techniques to learn safe controllers for different classes of systems. [6][7][8][9][10][11][12][13][14][15][16] While Markov decision process (MDP) based SMBRL methods have been available for discrete time systems with finite state and action spaces, [6][7][8][9] synthesizing online controllers for systems in continuous time, under output feedback, while guaranteeing stability and safety is still a challenging problem.…”

Section: Introductionmentioning

confidence: 99%

“…While the control barrier function results in safety guarantees, the existence of a smooth value function, in spite of a nonsmooth cost function, needs to be assumed. 16 This article is inspired by the nonlinear coordinate transformation first introduced in Reference 18. Leveraging the results of Reference 18, a barrier transformation (BT) to construct an equivalent, unconstrained optimal control problem from a state-constrained optimal control problem was introduced in Reference 14.…”

Section: Introductionmentioning

confidence: 99%

“…Applications such as manned aviation demand deterministic safety guarantees, and as such, online, real‐time learning in such systems is challenging. SMBRL techniques for CT deterministic systems have been studied in results such as References 13–16. In Reference 13, a SMBRL method has been developed to synthesize a real‐time safe controller online by incorporating the proximity penalty method developed in Reference 17 with the framework of control barrier functions.…”

Section: Introductionmentioning

confidence: 99%

“…In Reference 13, a SMBRL method has been developed to synthesize a real‐time safe controller online by incorporating the proximity penalty method developed in Reference 17 with the framework of control barrier functions. While the control barrier function results in safety guarantees, the existence of a smooth value function, in spite of a nonsmooth cost function, needs to be assumed 16 …”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Safe adaptive output‐feedback optimal control of a class of linear systems

Mahmud,

Abudia,

Nivison

et al. 2024

Intl J Robust & Nonlinear

Self Cite

View full text Add to dashboard Cite

The objective of this research is to enable safety‐critical systems to simultaneously learn and execute optimal control policies in a safe manner to achieve complex autonomy. Learning optimal policies via trial and error, that is, traditional reinforcement learning, is difficult to implement in safety‐critical systems, particularly when task restarts are unavailable. Safe model‐based reinforcement learning techniques based on a barrier transformation have recently been developed to address this problem. However, these methods rely on full‐state feedback, limiting their usability in a real‐world environment. In this work, an output‐feedback safe model‐based reinforcement learning technique based on a novel barrier‐aware dynamic state estimator has been designed to address this issue. The developed approach facilitates simultaneous learning and execution of safe control policies for safety‐critical linear systems. Simulation results indicate that barrier transformation is an effective approach to achieve online reinforcement learning in safety‐critical systems using output feedback.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Safe adaptive output‐feedback optimal control of a class of linear systems

Mahmud,

Abudia,

Nivison

et al. 2024

Intl J Robust & Nonlinear

Self Cite

View full text Add to dashboard Cite

show abstract

“…Introducing theoretical guarantees on reinforcement learning methods is an active area of research. 10 The problem of defining a search termination criterion like we seek in the multi-region scenarios appears novel in the problem space.…”

Section: Introductionmentioning

confidence: 99%

What, how, and when? A hybrid system approach to multi-region search and rescue

Haggerty

Strasser

Bloch

et al. 2023

Open Architecture/Open Business Model Net-Centric Systems and Defense Transformation 2023

View full text Add to dashboard Cite

Multi agent hybrid dynamical systems are a natural model for collaborative missions in which several steps and behaviors are required to achieve the goal of the mission. Missions are tasks featuring interacting subtasks, such as the decision of where to search, how to search, and when to transition from a search behavior to a rescue behavior. Control in hybrid systems is poorly understood. Theoretical results on state reachability rely on restrictive assumptions which hinder formal verification and optimization of such systems. Further difficulties arise if there are no a priori ordering or termination conditions on the intermediate steps and behaviors. We present a flexible framework to enable decentralized multi agent hybrid control and demonstrate its efficacy in a class of multi-region search and rescue scenarios. We also demonstrate the importance of dynamic target modeling at both levels of the hybrid state, i.e. estimating which region targets are in, how search behavior affects this estimate, and how the targets move between and within regions.

show abstract

A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Beckenbach,

Osinenko,

Streif

2024

Intl J Robust & Nonlinear

View full text Add to dashboard Cite

Reinforcement learning is commonly associated with training of reward‐maximizing (or cost‐minimizing) agents, in other words, controllers. It can be applied in model‐free or model‐based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with classical control approaches. In this work, we suggest a method to guarantee practical stability of the system‐controller closed loop in a purely online learning setting, in other words, without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The implementation of the overall control scheme is provided explicitly in a digital, sampled setting. That is, the controller receives the state of the system and computes the control action at discrete, specifically, equidistant moments in time. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.

show abstract

Safe Model-Based Reinforcement Learning for Systems With Parametric Uncertainties

Cited by 9 publications

References 34 publications

Safe adaptive output‐feedback optimal control of a class of linear systems

Safe adaptive output‐feedback optimal control of a class of linear systems

What, how, and when? A hybrid system approach to multi-region search and rescue

A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Contact Info

Product

Resources

About