Reinforcement learning has been shown to be an effective strategy for automatically training policies for challenging control problems. Focusing on non-cooperative multi-agent systems, we propose a novel reinforcement learning framework for training joint policies that form a Nash equilibrium. In our approach, rather than providing low-level reward functions, the user provides high-level specifications that encode the objective of each agent. Then, guided by the structure of the specifications, our algorithm searches over policies to identify one that provably forms an$$\epsilon $$ϵ-Nash equilibrium (with high probability). Importantly, it prioritizes policies in a way that maximizes social welfare across all agents. Our empirical evaluation demonstrates that our algorithm computes equilibrium policies with high social welfare, whereas state-of-the-art baselines either fail to compute Nash equilibria or compute ones with comparatively lower social welfare.
Recent advances in deep learning have enabled data-driven controller design for autonomous systems. However, verifying safety of such controllers, which are often hard-to-analyze neural networks, remains a challenge. Inspired by compositional strategies for program verification, we propose a framework for compositional learning and verification of neural network controllers. Our approach is to decompose the task (e.g., car navigation) into a sequence of subtasks (e.g., segments of the track), each corresponding to a different mode of the system (e.g., go straight or turn). Then, we learn a separate controller for each mode, and verify correctness by proving that (i) each controller is correct within its mode, and (ii) transitions between modes are correct. This compositional strategy not only improves scalability of both learning and verification, but also enables our approach to verify correctness for arbitrary compositions of the subtasks. To handle partial observability (e.g., LiDAR), we additionally learn and verify a mode predictor that predicts which controller to use. Finally, our framework also incorporates an algorithm that, given a set of controllers, automatically synthesizes the pre- and postconditions required by our verification procedure. We validate our approach in a case study on a simulation model of the F1/10 autonomous car, a system that poses challenges for existing verification tools due to both its reliance on LiDAR observations, as well as the need to prove safety for complex track geometries. We leverage our framework to learn and verify a controller that safely completes any track consisting of an arbitrary sequence of five kinds of track segments.
Reactive synthesis algorithms allow automatic construction of policies to control an environment modeled as a Markov Decision Process (MDP) that are optimal with respect to high-level temporal logic specifications assuming the MDP model is known a priori. Reinforcement learning algorithms, in contrast, are designed to learn an optimal policy when the transition probabilities of the MDP are unknown, but require the user to associate local rewards with transitions. The appeal of highlevel temporal logic specifications has motivated research to develop RL algorithms for synthesis of policies from specifications. To understand the techniques, and nuanced variations in their theoretical guarantees, in the growing body of resulting literature, we develop a formal framework for defining transformations among RL tasks with different forms of objectives. We define the notion of sampling-based reduction to relate two MDPs whose transition probabilities can be learnt by sampling, followed by formalization of preservation of optimal policies, convergence, and robustness. We then use our framework to restate known results, establish new results to fill in some gaps, and identify open problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.