This paper introduces the game of reconnaissance blind multi-chess (RBMC) as a paradigm and test bed for understanding and experimenting with autonomous decision making under uncertainty and in particular managing a network of heterogeneous Intelligence, Surveillance and Reconnaissance (ISR) sensors to maintain situational awareness informing tactical and strategic decision making. The intent is for RBMC to serve as a common reference or challenge problem in fusion and resource management of heterogeneous sensor ensembles across diverse mission areas. We have defined a basic rule set and a framework for creating more complex versions, developed a web-based software realization to serve as an experimentation platform, and developed some initial machine intelligence approaches to playing it.
Despite groundbreaking progress in reinforcement learning for robotics, gameplay, and other complex domains, major challenges remain in applying reinforcement learning to the evolving, open-world problems often found in critical application spaces. Reinforcement learning solutions tend to generalize poorly when exposed to new tasks outside of the data distribution they are trained on, prompting an interest in continual learning algorithms. In tandem with research on continual learning algorithms, there is a need for challenge environments, carefully designed experiments, and metrics to assess research progress. We address the latter need by introducing a framework for continual reinforcement-learning development and assessment using Lifelong Learning Explorer (L2Explorer), a new, Unitybased, first-person 3D exploration environment that can be continuously reconfigured to generate a range of tasks and task variants structured into complex and evolving evaluation curricula. In contrast to procedurally generated worlds with randomized components, we have developed a systematic approach to defining curricula in response to controlled changes with accompanying metrics to assess transfer, performance recovery, and data efficiency. Taken together, the L2Explorer environment and evaluation approach provides a framework for developing future evaluation methodologies in open-world settings and rigorously evaluating approaches to lifelong learning.In recent years, Deep Reinforcement Learning (DRL) approaches have begun to deliver powerful results for a variety of compelling domains, including games such as Chess, Go, and Shogi Silver et al. [2018]; Atari video games Mnih et al. [2013]; more complex strategy video games Berner et al. [2019], Vinyals et al. [2019]; and dexterous robotic manipulation Rajeswaran et al. [2017]. Despite the groundbreaking success in training autonomous agents, resulting policies tend to be very brittle and generalize poorly Chan et al. [2019]. When presented with a new task or a task variant, DRL approaches are susceptible to a performance drop Zhang et al. [2018], Kirk et al. [2021] due to the catastrophic forgetting problem French [1999], McCloskey and Cohen [1989], which may not be overcome by domain randomization strategies alone. As the field moves from environments which are fixed to evolving, open-world scenarios, current DRL approaches will be insufficient.This performance gap has led to an interest in Continual Learning, which seeks to design algorithms to learn over sequences of tasks. In the related, but broader, concept of Lifelong Learning Chen and Liu [2018], an agent learns over a lifetime of experiences (see Fig. 1) in an evolving environment (for purposes of this paper, however, we treat continual learning as synonymous with lifelong learning as our approach is applicable to both concepts). Much recent work has been on supervised classification under distribution shifts Song et al. [2020] and learning a sequence of tasks Parisi et al. [2019], Hsu et al. [2018. Continual RL Khetar...
Over 20,000 children experience a cardiac arrest annually in the U.S., only 17-50% survive. There is massive variability in the quality of lifesaving cardiopulmonary resuscitation (CPR) that children receive due to limited availability of pediatric specialized emergency resources, suppressing the survival rate. High quality CPR is performed to replace the function of the beating heart during a cardiac arrest, preventing the asphyxiation of the vital organs, while the inciting process can be investigated. A collaboration between the
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.