We describe a probabilistic framework for synthesizing control policies for general multi-robot systems, given environment and sensor models and a cost function. Decentralized, partially observable Markov decision processes (Dec-POMDPs) are a general model of decision processes where a team of agents must cooperate to optimize some objective (specified by a shared reward or cost function) in the presence of uncertainty, but where communication limitations mean that the agents cannot share their state, so execution must proceed in a decentralized fashion. While Dec-POMDPs are typically intractable to solve for real-world problems, recent research on the use of macro-actions in Dec-POMDPs has significantly increased the size of problem that can be practically solved as a Dec-POMDP. We describe this general model, and show how, in contrast to most existing methods that are specialized to a particular problem class, it can synthesize control policies that use whatever opportunities for coordination are present in the problem, while balancing off uncertainty in outcomes, sensor information, and information about other agents. We use three variations on a warehouse task to show that a single planner of this type can generate cooperative behavior using task allocation, direct communication, and signaling, as appropriate.
Smart environments offer valuable technologies for activity monitoring and health assessment. Here, we describe an integration of robots into smart environments to provide more interactive support of individuals with functional limitations. RAS, our Robot Activity Support system, partners smart environment sensing, object detection and mapping, and robot interaction to detect and assist with activity errors that may occur in everyday settings. We describe the components of the RAS system and demonstrate its use in a smart home testbed. To evaluate the usability of RAS, we also collected and analyzed feedback from participants who received assistance from RAS in a smart home setting as they performed routine activities.
We introduce a principled method for multi-robot coordination based on a general model (termed a MacDec-POMDP) of multi-robot cooperative planning in the presence of stochasticity, uncertain sensing, and communication limitations. A new MacDec-POMDP planning algorithm is presented that searches over policies represented as finite-state controllers, rather than the previous policy tree representation. Finite-state controllers can be much more concise than trees, are much easier to interpret, and can operate over an infinite horizon. The resulting policy search algorithm requires a substantially simpler simulator that models only the outcomes of executing a given set of motor controllers, not the details of the executions themselves and can solve significantly larger problems than existing MacDec-POMDP planners. We demonstrate significant performance improvements over previous methods and show that our method can be used for actual multi-robot systems through experiments on a cooperative multi-robot bartending domain.
We introduce a principled method for multi-robot coordination based on a generic model (termed a MacDec-POMDP) of multi-robot cooperative planning in the presence of stochasticity, uncertain sensing and communication limitations. We present a new MacDec-POMDP planning algorithm that searches over policies represented as finite-state controllers, rather than the existing policy tree representation. Finite-state controllers can be much more concise than trees, are much easier to interpret, and can operate over an infinite horizon. The resulting policy search algorithm requires a substantially simpler simulator that models only the outcomes of executing a given set of motor controllers, not the details of the executions themselves and can to solve significantly larger problems than existing MacDec-POMDP planners. We demonstrate significantly improved performance over previous methods and application to a cooperative multi-robot bartending task, showing that our method can be used for actual multi-robot systems.
Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using deep neural networks as function approximators to learn directly from raw input images. However, learning directly from raw images is data inefficient. The agent must learn feature representation of complex states in addition to learning a policy. As a result, deep RL typically suffers from slow learning speeds and often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable to real-world settings where data is expensive. In this work, we improve data efficiency in deep RL by addressing one of the two learning goals, feature learning. We leverage supervised learning to pre-train on a small set of non-expert human demonstrations and empirically evaluate our approach using the asynchronous advantage actor-critic algorithms (A3C) in the Atari domain. Our results show significant improvements in learning speed, even when the provided demonstration is noisy and of low quality.the Asynchronous Advantage Actor-Critic (A3C) ) algorithm in six Atari games (Bellemare et al. 2013). Unlike previous work where a large amount of expert human data is required to achieve good initial performance boost, our approach shows significant learning speed improvements on all experiments with only a relatively small amount of noisy, non-expert demonstration data. The simplicity of our approach has made it generally adaptable to other deep RL algorithms and potentially to other domains since the collection of demonstration data becomes easy. In addition, we apply Gradient-weighted Class Activation Mapping (Grad-CAM) (Selvaraju et al. 2017) on learned feature maps for both the human data and the agent data, providing a detailed analysis on why pre-training helps to speed up learning. Our work makes the following contributions:1. We show that pre-training on a small amount of non-expert human demonstration data is sufficient to achieve significant performance improvements. 2. We are the first to apply the transformed Bellman (TB) operator (Pohlen et al. 2018) in the A3C algorithm ) and further improve A3C's performance on both baseline and pre-training methods. 3. We propose a modified version of the Grad-CAM method (Selvaraju et al. 2017), which we are the first to provide empirical analysis on what features are learned from pre-training, indicating why pre-training on human demonstration data helps. 4. We release our code and all collected human demonstration data at https://github.com/gabrieledcjr/ DeepRL.This article is organized as the following. In the next section, we review some of the related work in using pre-training to improve data efficiency. Section 3 provides background on deep RL algorithms and the transformed Bellman operator. In Section 4, we propose our pre-training methods for deep RL. Followed by Section 5 where we describe the experimental designs. Results and analysis are presented in Section 6. We conclude this article in Section 7 with discussio...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.