PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments

Majumdar, Anirudha; Farid, Alyaa; Sonar, Anoopkumar

doi:10.48550/arxiv.1806.04225

Cited by 4 publications

(13 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the theoretical front, an important direction for future work is to provide rigorous guarantees on generalization to novel domains. One potential avenue is to combine the algorithmic techniques presented here with recent results on PAC-Bayes generalization theory applied to control and RL settings [13,36]. On the algorithmic front, an interesting direction is to use domain randomization techniques to automatically generate new training domains that can be used to improve invariant policy learning (e.g., automatically generating domains with different colored keys in the colored-keys example).…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

Sonar,

Pacelli,

Majumdar

2020

Preprint

Self Cite

View full text Add to dashboard Cite

A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domain experienced during training. In this paper, we approach this challenge through the following invariance principle: an agent must find a representation such that there exists an action-predictor built on top of this representation that is simultaneously optimal across all training domains. Intuitively, the resulting invariant policy enhances generalization by finding causes of successful actions. We propose a novel learning algorithm, Invariant Policy Optimization (IPO), that explicitly enforces this principle and learns an invariant policy during training. We compare our approach with standard policy gradient methods and demonstrate significant improvements in generalization performance on unseen domains for Linear Quadratic Regulator (LQR) problems and our own benchmark in the MiniGrid Gym environment.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Distributional robustness. The PAC-Bayes Control approach [13,14] provides a way to make provable generalization guarantees under distributional shifts. This approach is particularly useful in safety-critical applications where it is important to quantify the impact of switching between a training domain and a test domain.…”

Section: Related Workmentioning

confidence: 99%

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

Sonar,

Pacelli,

Majumdar

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Majumdar et al explore one potential theory for an automated guidance system, which leverages another form of machine learning. Majumdar et al utilizes a similar framework for both the navigation of a UAS, and the task of grasping an object [10]. One issue with the grasping framework presented in the paper can be found in the mass measurement used for crucial grasping calculations [10].…”

Section: Potential Computer Visionmentioning

confidence: 99%

“…Majumdar et al utilizes a similar framework for both the navigation of a UAS, and the task of grasping an object [10]. One issue with the grasping framework presented in the paper can be found in the mass measurement used for crucial grasping calculations [10]. The mass used in their calculations is a randomly generated number in the range [0.05, 0.15] kg [10], which could cause inaccurate results in a real setting.…”

Section: Potential Computer Visionmentioning

confidence: 99%

“…One issue with the grasping framework presented in the paper can be found in the mass measurement used for crucial grasping calculations [10]. The mass used in their calculations is a randomly generated number in the range [0.05, 0.15] kg [10], which could cause inaccurate results in a real setting. This problem could be solved by using the suggested estimated mass calculation presented to achieve a more tailored result.…”

Section: Potential Computer Visionmentioning

confidence: 99%

See 1 more Smart Citation

Integrative Use of Computer Vision and Unmanned Aircraft Technologies in Public Inspection

Travis

Brinkman

Huang³

et al. 2021

DG.O2021: The 22nd Annual International Conference on Digital Government Research

View full text Add to dashboard Cite

Unmanned Aircraft Systems (UAS) have become an important resource for public service providers and smart cities. The purpose of this study is to expand this research area by integrating computer vision and UAS technology to automate public inspection. As an initial case study for this work, a dataset of common foreign object debris (FOD) is developed to assess the potential of light-weight automated detection. This paper presents the rationale and creation of this dataset. Future iterations of our work will include further technical details analyzing experimental implementation. At a local airport, UAS and portable cameras are used to collect the data contained in the initial version of this dataset. After collecting these videos of FOD, they were split into individual frames and stored as several thousand images. These frames are then annotated following standard computer vision format and stored in a folderstructure that reflects our creation method. The dataset annotations are validated using a custom tool that could be abstracted to fit future applications. Initial detection models were successfully created using the famous You Only Look Once algorithm, which indicates the practicality of the proposed data. Finally, several potential scenarios that could utilize either this dataset or similar methods for other public service are presented.

show abstract

Probably Approximately Correct Vision-Based Planning using Motion Primitives

Veer¹,

Majumdar²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

This paper presents a deep reinforcement learning approach for synthesizing vision-based planners that provably generalize to novel environments (i.e., environments unseen during training). We leverage the Probably Approximately Correct (PAC)-Bayes framework to obtain an upper bound on the expected cost of policies across all environments. Minimizing the PAC-Bayes upper bound thus trains policies that are accompanied by a certificate of performance on novel environments. The training pipeline we propose provides strong generalization guarantees for deep neural network policies by (a) obtaining a good prior distribution on the space of policies using Evolutionary Strategies (ES) followed by (b) formulating the PAC-Bayes optimization as an efficiently-solvable parametric convex optimization problem. We demonstrate the efficacy of our approach for producing strong generalization guarantees for learned vision-based motion planners through two simulated examples: (1) an Unmanned Aerial Vehicle (UAV) navigating obstacle fields with an onboard vision sensor, and (2) a dynamic quadrupedal robot traversing rough terrains with proprioceptive and exteroceptive sensors.

show abstract

PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments

Cited by 4 publications

References 35 publications

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

Integrative Use of Computer Vision and Unmanned Aircraft Technologies in Public Inspection

Probably Approximately Correct Vision-Based Planning using Motion Primitives

Contact Info

Product

Resources

About