Despite the recent advances of deep reinforcement learning (DRL), agents trained by DRL tend to be brittle and sensitive to the training environment, especially in the multi-agent scenarios. In the multi-agent setting, a DRL agent’s policy can easily get stuck in a poor local optima w.r.t. its training partners – the learned policy may be only locally optimal to other agents’ current policies. In this paper, we focus on the problem of training robust DRL agents with continuous actions in the multi-agent learning setting so that the trained agents can still generalize when its opponents’ policies alter. To tackle this problem, we proposed a new algorithm, MiniMax Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with the following contributions: (1) we introduce a minimax extension of the popular multi-agent deep deterministic policy gradient algorithm (MADDPG), for robust policy learning; (2) since the continuous action space leads to computational intractability in our minimax learning objective, we propose Multi-Agent Adversarial Learning (MAAL) to efficiently solve our proposed formulation. We empirically evaluate our M3DDPG algorithm in four mixed cooperative and competitive multi-agent environments and the agents trained by our method significantly outperforms existing baselines.
The Stackelberg Security Game (SSG) model has been immensely influential in security research since it was introduced roughly a decade ago. Furthermore, deployed SSG-based applications are one of most successful examples of game theory applications in the real world. We present a broad survey of recent technical advances in SSG and related literature, and then look to the future by highlighting the new potential applications and open research problems in SSG.
In recent years, Stackelberg Security Games have been successfully applied to solve resource allocation and scheduling problems in several security domains. However, previous work has mostly assumed that the targets are stationary relative to the defender and the attacker, leading to discrete game models with finite numbers of pure strategies. This paper in contrast focuses on protecting mobile targets that leads to a continuous set of strategies for the players. The problem is motivated by several real-world domains including protecting ferries with escort boats and protecting refugee supply lines. Our contributions include: (i) A new game model for multiple mobile defender resources and moving targets with a discretized strategy space for the defender and a continuous strategy space for the attacker.(ii) An efficient linear-programming-based solution that uses a compact representation for the defender's mixed strategy, while accurately modeling the attacker's continuous strategy using a novel sub-interval analysis method. (iii) Discussion and analysis of multiple heuristic methods for equilibrium refinement to improve robustness of defender's mixed strategy. (iv) Discussion of approaches to sample actual defender schedules from the defender's mixed strategy. (iv) Detailed experimental analysis of our algorithms in the ferry protection domain.
Artificial Intelligence (AI) is currently seeing major media interest, significant interest from federal agencies, and interest from society in general. From its origins in the 1950s, to early optimistic predictions of its founders, to some recent negative views put forth by the media, AI has seen its share of ups and downs in public interest. Yet the steady progress made in the past 50-60 years in basic AI research, the availability of massive amounts of data, and vast advances in computing power have now brought us to a unique and exciting phase in AI history. It is now up to us to shape the evolution of AI research. AI can be a major force for social good; it depends in part on how we shape this new technology and the questions we use to inspire young researchers. Currently there is a significant spotlight on the future ethical, safety, and legal concerns of future applications of AI. While understanding and grappling with these concerns, and shaping the long-term future, is a legitimate aspect of future AI research and policy making decisions, we must not ignore the societal benefits that AI is delivering and can deliver in the near future, and how our actions today can shape the future of AI.The Computing Community Consortium (CCC), along with the White House Office of Science and Technology Policy (OSTP), and the Association for the Advancement of Artificial Intelligence (AAAI), co-sponsored a public workshop on Artificial Intelligence for Social Good on June 7th, 2016 in Washington, DC. This was one of five workshops that OSTP co-sponsored and held around the country to spur public dialogue on artificial intelligence, machine learning, and to identify challenges and opportunities related to AI. In the AI for Social Good workshop, the successful deployments and the potential use of AI in various topics that are essential for social good were discussed, including but not limited to urban computing, health, environmental sustainability, and public welfare. This report highlights each of these as well as a number of crosscutting issues. Urban ComputingUrban computing pertains to the study and application of computing technology in urban areas. As such, it is intimately tied to urban planning, specifically infrastructure, including transportation, communication, and distribution networks. The urban computing workshop session focused primarily on transportation networks, the goal being to use AI technology to improve mobility and safety. We envision a future in which it is significantly easier to get people to the things they need and the things they want, including, but not limited to, education, jobs, healthcare, and personal services of all kinds (supermarkets, banks, etc.).Time spent commuting to school or to work is time not spent working, studying, or with one's family. When people do not have easy access to preventative healthcare, later costs to reverse adverse developments can far exceed those that would have been incurred had appropriate preventative measures been applied (Preventive Healthcare, 2016)....
Objective: Knee osteoarthritis (KOA) is a heterogeneous condition representing a variety of potentially distinct phenotypes. The purpose of this study was to apply innovative machine learning approaches to KOA phenotyping in order to define progression phenotypes that are potentially more responsive to interventions. Design: We used publicly available data from the Foundation for the National Institutes of Health (FNIH) osteoarthritis (OA) Biomarkers Consortium, where radiographic (medial joint space narrowing of 0.7 mm), and pain progression (increase of 9 Western Ontario and McMaster Universities Osteoarthritis Index [WOMAC] points) were defined at 48 months, as four mutually exclusive outcome groups (none, both, pain only, radiographic only), along with an extensive set of covariates. We applied distance weighted discrimination (DWD), direction-projection-permutation (DiProPerm) testing, and clustering methods to focus on the contrast (z-scores) between those progressing by both criteria ("progressors") and those progressing by neither ("non-progressors"). Results: Using all observations (597 individuals, 59% women, mean age 62 years and BMI 31 kg/m 2) and all 73 baseline variables available in the dataset, there was a clear separation among progressors and non-progressors (z ¼ 10.1). Higher z-scores were seen for the magnetic resonance imaging (MRI)-based variables than for demographic/clinical variables or biochemical markers. Baseline variables with the greatest contribution to non-progression at 48 months included WOMAC pain, lateral meniscal extrusion, and serum N-terminal pro-peptide of collagen IIA (PIIANP), while those contributing to progression included bone marrow lesions, osteophytes, medial meniscal extrusion, and urine C-terminal crosslinked telopeptide type II collagen (CTX-II). Conclusions: Using methods that provide a way to assess numerous variables of different types and scalings simultaneously in relation to an outcome of interest enabled a data-driven approach that identified key variables associated with a progression phenotype.
Poaching is considered a major driver for the population drop of key species such as tigers, elephants, and rhinos, which can be detrimental to whole ecosystems. While conducting foot patrols is the most commonly used approach in many countries to prevent poaching, such patrols often do not make the best use of the limited patrolling resources.
Although recent work in AI has made great progress in solving large, zero-sum, extensive-form games, the underlying assumption in most past work is that the parameters of the game itself are known to the agents. This paper deals with the relatively under-explored but equally important "inverse" setting, where the parameters of the underlying game are not known to all agents, but must be learned through observations. We propose a differentiable, end-to-end learning framework for addressing this task. In particular, we consider a regularized version of the game, equivalent to a particular form of quantal response equilibrium, and develop 1) a primal-dual Newton method for finding such equilibrium points in both normal and extensive form games; and 2) a backpropagation method that lets us analytically compute gradients of all relevant game parameters through the solution itself. This ultimately lets us learn the game by training in an end-to-end fashion, effectively by integrating a "differentiable game solver" into the loop of larger deep network architectures. We demonstrate the effectiveness of the learning method in several settings including poker and security game tasks.
Green Security Games (GSGs) have been proposed and applied to optimize patrols conducted by law enforcement agencies in green security domains such as combating poaching, illegal logging and overfishing. However, real-time information such as footprints and agents' subsequent actions upon receiving the information, e.g., rangers following the footprints to chase the poacher, have been neglected in previous work. To fill the gap, we first propose a new game model GSG-I which augments GSGs with sequential movement and the vital element of real-time information. Second, we design a novel deep reinforcement learning-based algorithm, DeDOL, to compute a patrolling strategy that adapts to the real-time information against a best-responding attacker. DeDOL is built upon the double oracle framework and the policy-space response oracle, solving a restricted game and iteratively adding best response strategies to it through training deep Q-networks. Exploring the game structure, DeDOL uses domain-specific heuristic strategies as initial strategies and constructs several local modes for efficient and parallelized training. To our knowledge, this is the first attempt to use Deep Q-Learning for security games.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.