Kishor Jothimurugan scite author profile

Kishor Jothimurugan

4Publications

42Citation Statements Received

59Citation Statements Given

How they've been cited

How they cite others

116

Affiliations

Philadelphia University, University of Pennsylvania, California University of Pennsylvania

Publications

Order By: Most citations

Specification-Guided Learning of Nash Equilibria with High Social Welfare

Jothimurugan

Bansal

Bastani

et al. 2022

View full text Add to dashboard Cite

Reinforcement learning has been shown to be an effective strategy for automatically training policies for challenging control problems. Focusing on non-cooperative multi-agent systems, we propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium. In our approach, rather than providing low-level reward functions, the user provides high-level specifications that encode the objective of each agent. Then, guided by the structure of the specifications, our algorithm searches over policies to identify one that provably forms an$$\epsilon $$ϵ-Nash equilibrium (with high probability). Importantly, it prioritizes policies in a way that maximizes social welfare across all agents. Our empirical evaluation demonstrates that our algorithm computes equilibrium policies with high social welfare, whereas state-of-the-art baselines either fail to compute Nash equilibria or compute ones with comparatively lower social welfare.

show abstract

Compositional Learning and Verification of Neural Network Controllers

Ivanov

Jothimurugan

Hsu

et al. 2021

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Recent advances in deep learning have enabled data-driven controller design for autonomous systems. However, verifying safety of such controllers, which are often hard-to-analyze neural networks, remains a challenge. Inspired by compositional strategies for program verification, we propose a framework for compositional learning and verification of neural network controllers. Our approach is to decompose the task (e.g., car navigation) into a sequence of subtasks (e.g., segments of the track), each corresponding to a different mode of the system (e.g., go straight or turn). Then, we learn a separate controller for each mode, and verify correctness by proving that (i) each controller is correct within its mode, and (ii) transitions between modes are correct. This compositional strategy not only improves scalability of both learning and verification, but also enables our approach to verify correctness for arbitrary compositions of the subtasks. To handle partial observability (e.g., LiDAR), we additionally learn and verify a mode predictor that predicts which controller to use. Finally, our framework also incorporates an algorithm that, given a set of controllers, automatically synthesizes the pre- and postconditions required by our verification procedure. We validate our approach in a case study on a simulation model of the F1/10 autonomous car, a system that poses challenges for existing verification tools due to both its reliance on LiDAR observations, as well as the need to prove safety for complex track geometries. We leverage our framework to learn and verify a controller that safely completes any track consisting of an arbitrary sequence of five kinds of track segments.

show abstract

A Framework for Transforming Specifications in Reinforcement Learning

Alur¹,

Bansal²,

Bastani³

et al. 2022

View full text Add to dashboard Cite

A Framework for Transforming Specifications in Reinforcement Learning

Alur¹,

Bansal²,

Bastani³

et al. 2021

Preprint

View full text Add to dashboard Cite

Reactive synthesis algorithms allow automatic construction of policies to control an environment modeled as a Markov Decision Process (MDP) that are optimal with respect to high-level temporal logic specifications assuming the MDP model is known a priori. Reinforcement learning algorithms, in contrast, are designed to learn an optimal policy when the transition probabilities of the MDP are unknown, but require the user to associate local rewards with transitions. The appeal of highlevel temporal logic specifications has motivated research to develop RL algorithms for synthesis of policies from specifications. To understand the techniques, and nuanced variations in their theoretical guarantees, in the growing body of resulting literature, we develop a formal framework for defining transformations among RL tasks with different forms of objectives. We define the notion of sampling-based reduction to relate two MDPs whose transition probabilities can be learnt by sampling, followed by formalization of preservation of optimal policies, convergence, and robustness. We then use our framework to restate known results, establish new results to fill in some gaps, and identify open problems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.