Verifying Reinforcement Learning up to Infinity

Bacci, Edoardo; Giacobbe, Mirco; Parker, David

doi:10.24963/ijcai.2021/297

Cited by 14 publications

(10 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(ii) Adaptive cruise control [2]: The problem has two vehicles i ∈ {lead , ego}, whose state is determined by variables x i and v i for the position and speed of each car, respectively. The lead car proceeds at constant speed (28 m s −1 ), and the agent controls the acceleration (±1 m s −2 ) of ego using two actions.…”

Section: Methodsmentioning

confidence: 99%

“…Formal verification of RL, but in a non-probabilistic setting includes: [5], which extracts and analyses decision trees; [27], which checks safety and liveness properties for deep RL; and [2], which also uses template polyhedra and MILP to build abstractions, but to check (non-probabilistic) safety invariants.…”

Section: Related Workmentioning

confidence: 99%

“…Lastly, we describe how to partition abstract states via refinement. We omit details of the environment abstraction since we reuse the symbolic post operator over template polyhedra given in [2], also performed with MILP. This supports environments specified as linear, piecewise linear or non-linear systems defined with polynomial and transcendental functions.…”

Section: Template-based Abstraction Of Neural Network Policiesmentioning

confidence: 99%

“…An alternative approach, which we take in this paper, is to verify an RL policy's correctness after it has been learnt, rather than placing restrictions on the learning process or on its deployment. Progress has been made in the formal verification of policies for RL [5] and also for the specific case of deep RL [27,2,3], in the latter case by building on advances in abstraction and verification techniques for neural networks; [2] also exploits the development of efficient abstract domains such as template polyhedra [41], previously applied to the verification of continuous-space and hybrid systems [6,15].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Bacci

Parker

2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Deep reinforcement learning is an increasingly popular technique for synthesising policies to control an agent's interaction with its environment. There is also growing interest in formally verifying that such policies are correct and execute safely. Progress has been made in this area by building on existing work for verification of deep neural networks and of continuous-state dynamical systems. In this paper, we tackle the problem of verifying probabilistic policies for deep reinforcement learning, which are used to, for example, tackle adversarial environments, break symmetries and manage trade-offs. We propose an abstraction approach, based on interval Markov decision processes, that yields probabilistic guarantees on a policy's execution, and present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking. We implement our approach and illustrate its effectiveness on a selection of reinforcement learning benchmarks.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Template-based Abstraction Of Neural Network Policiesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Bacci

Parker

2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…To date, the DNN verification community has focused primarily on feed-forward DNNs [21], [27], [30], [41], [68]. Some work has been carried out on verifying DRL networks, which pose greater challenges: beyond the general scalability challenges of DNNs verification, in DRL verification we must also take into account that agents typically interact with a reactive environment [5], [9], [13], [17]. In particular, these agents are invoked multiple times, and the inputs of each invocation are usually affected by the outputs of the previous invocations.…”

Section: Introductionmentioning

confidence: 99%

Verifying Learning-Based Robotic Navigation Systems

Amir¹,

Corsi²,

Yerushalmi³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep reinforcement learning (DRL) has become a dominant deep-learning paradigm for various tasks in which complex policies are learned within reactive systems. In parallel, there has recently been significant research on verifying deep neural networks. However, to date, there has been little work demonstrating the use of modern verification tools on real, DRLcontrolled systems. In this case-study paper, we attempt to begin bridging this gap, and focus on the important task of mapless robotic navigation -a classic robotics problem, in which a robot, usually controlled by a DRL agent, needs to efficiently and safely navigate through an unknown arena towards a desired target. We demonstrate how modern verification engines can be used for effective model selection, i.e., the process of selecting the best available policy for the robot in question from a pool of candidate policies. Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior, such as collisions and infinite loops. We also apply verification to identify models with overly conservative behavior, thus allowing users to choose superior policies that are better at finding an optimal, shorter path to a target. To validate our work, we conducted extensive experiments on an actual robot, and confirmed that the suboptimal policies detected by our method were indeed flawed. We also compared our verification-driven approach to state-of-the-art gradient attacks, and our results demonstrate that gradient-based methods are inadequate in this setting. Our work is the first to demonstrate the use of DNN verification backends for recognizing suboptimal DRL policies in real-world robots, and for filtering out unwanted policies. We believe that the methods presented in this work can be applied to a large range of application domains that incorporate deep-learning-based agents.

show abstract

Verified Probabilistic Policies for Deep Reinforcement Learning

Bacci

Parker

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Verifying Reinforcement Learning up to Infinity

Cited by 14 publications

References 0 publications

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Verifying Learning-Based Robotic Navigation Systems

Verified Probabilistic Policies for Deep Reinforcement Learning

Contact Info

Product

Resources

About