Safe Exploration for Active Learning with Gaussian Processes

Schreiter, Jens; Nguyen-Tuong, Duy; Eberts, Mona; Bischoff, Bastian; Markert, Heiner; Toussaint, Marc

doi:10.1007/978-3-319-23461-8_9

Cited by 68 publications

(97 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Schreiter et al [22] propose a safe exploration strategy SAL for a similar problem to ours. They optimize a function in a safe manner where the feasible region is unknown.…”

Section: Safe Explorationmentioning

confidence: 99%

“…• PIBU: Bayesian optimization with PIBU (Equation (17)) • CMA: Covariance matrix adaptation [9] algorithm • UCB: Bayesian optimization with the acquisition function upper confidence bound (UCB) [3] • PoWER: Policy search algorithm [11] • CORL + PIBU: Algorithm 1 with PIBU (Equation (17)) • CORL + UCB: Algorithm 1 with UCB • CORL + SAL: Algorithm 1 with SAL [22] • CORL + CMA: Algorithm 1 with CMA Only the PIBU and SAL variants aim for a safe exploration during the optimization process. Note that SAL assumes to observe the distance to the feasibility boundary in critical (but feasible) regions, which all other methods do not observe.…”

Section: A Evaluation Of Corl On a Synthetic Benchmarkmentioning

confidence: 99%

See 1 more Smart Citation

Combined Optimization and Reinforcement Learning for Manipulation Skills

Englert

Toussaint

Robotics: Science and Systems XII

Self Cite

View full text Add to dashboard Cite

Abstract-This work addresses the problem of how a robot can improve a manipulation skill in a sample-efficient and secure manner. As an alternative to the standard reinforcement learning formulation where all objectives are defined in a single reward function, we propose a generalized formulation that consists of three components: 1) A known analytic control cost function; 2) A black-box return function; and 3) A black-box binary success constraint. While the overall policy optimization problem is highdimensional, in typical robot manipulation problems we can assume that the black-box return and constraint only depend on a lower-dimensional projection of the solution. With our formulation we can exploit this structure for a sample-efficient learning framework that iteratively improves the policy with respect to the objective functions under the success constraint. We employ efficient 2nd-order optimization methods to optimize the high-dimensional policy w.r.t. the analytic cost function while keeping the lower dimensional projection fixed. This is alternated with safe Bayesian optimization over the lower-dimensional projection to address the black-box return and success constraint. During both improvement steps the success constraint is used to keep the optimization in a secure region and to clearly distinguish between motions that lead to success or failure. The learning algorithm is evaluated on a simulated benchmark problem and a door opening task with a PR2.

show abstract

“…Schreiter et al [22] propose a safe exploration strategy SAL for a similar problem to ours. They optimize a function in a safe manner where the feasible region is unknown.…”

Section: Safe Explorationmentioning

confidence: 99%

Section: A Evaluation Of Corl On a Synthetic Benchmarkmentioning

confidence: 99%

Combined Optimization and Reinforcement Learning for Manipulation Skills

Englert

Toussaint

Robotics: Science and Systems XII

Self Cite

View full text Add to dashboard Cite

show abstract

“…[19], or by safety constrained Bayesian optimization as e.g. in [20], [21]. These techniques share the limitation that they need to be tailored to a task-specific class of policies.…”

Section: Related Workmentioning

confidence: 99%

“…The model is computed according to Section V-A based on measurements of system (21) as depicted in Figure 6, sensor noise σ s = 0.01 and prior distribution Σ p i = 10I n . The state feedback…”

Section: A Details Of Numerical Examplementioning

confidence: 99%

Linear Model Predictive Safety Certification for Learning-Based Control

Wabersich

Zeilinger

2018

2018 IEEE Conference on Decision and Control (CDC)

104

View full text Add to dashboard Cite

Reinforcement learning (RL) methods have demonstrated their efficiency in simulation environments. However, many applications for which RL offers great potential, such as autonomous driving, are also safety critical and require a certified closed-loop behavior in order to meet safety specifications in the presence of physical constraints. This paper introduces a concept, called probabilistic model predictive safety certification (PMPSC), which can be combined with any RL algorithm and provides provable safety certificates in terms of state and input chance constraints for potentially large-scale systems. The certificate is realized through a stochastic tube that safely connects the current system state with a terminal set of states, that is known to be safe. A novel formulation in terms of a convex receding horizon problem allows a recursively feasible real-time computation of such probabilistic tubes, despite the presence of possibly unbounded disturbances. A design procedure for MPSC relying on bayesian inference and recent advances in probabilistic set invariance is presented. Using a numerical car simulation, the method and its design procedure are illustrated by enhancing a simple RL algorithm with safety certificates.

show abstract

“…Due to the inherent uncertainty, the worst case scenario (e.g., possible lowest rewards) is typically taken into account [13], [17] and the set of safe policies can be expanded by exploring the states [4], [5]. To address the issue of this uncertainty for nonlinear-model estimation tasks, Gaussian process regression [18] is a strong tool, and many safe learning studies have taken advantage of its property (e.g., [4], [6], [7], [10], [13]).…”

Section: Introductionmentioning

confidence: 99%

Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation

et al. 2019

View full text Add to dashboard Cite

This paper presents a safe learning framework that employs an adaptive model learning algorithm together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique. We use the learned model in combination with control barrier certificates which constrain policies (feedback controllers) in order to maintain safety, which refers to avoiding particular undesirable regions of the state space. Under certain conditions, recovery of safety in the sense of Lyapunov stability after violations of safety due to the nonstationarity is guaranteed. In addition, we reformulate an action-value function approximation to make any kernel-based nonlinear function estimation method applicable to our adaptive learning framework. Lastly, solutions to the barrier-certified policy optimization are guaranteed to be globally optimal, ensuring the greedy policy improvement under mild conditions. The resulting framework is validated via simulations of a quadrotor, which has previously been used under stationarity assumptions in the safe learnings literature, and is then tested on a real robot, the brushbot, whose dynamics is unknown, highly complex and nonstationary.Index Terms-Safe learning, control barrier certificate, sparse optimization, kernel adaptive filter, brushbot

show abstract

Safe Exploration for Active Learning with Gaussian Processes

Cited by 68 publications

References 17 publications

Combined Optimization and Reinforcement Learning for Manipulation Skills

Combined Optimization and Reinforcement Learning for Manipulation Skills

Linear Model Predictive Safety Certification for Learning-Based Control

Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation

Contact Info

Product

Resources

About