Reinforcement learning (RL) has achieved tremendous success as a general framework for learning how to make decisions. However, this success relies on the interactive hand-tuning of a reward function by RL experts. On the other hand, inverse reinforcement learning (IRL) seeks to learn a reward function from readily-obtained human demonstrations. Yet, IRL suffers from two major limitations: 1) reward ambiguitythere are an infinite number of possible reward functions that could explain an expert's demonstration and 2) heterogeneityhuman experts adopt varying strategies and preferences, which makes learning from multiple demonstrators difficult due to the common assumption that demonstrators seeks to maximize the same reward. In this work, we propose a method to jointly infer a task goal and humans' strategic preferences via network distillation. This approach enables us to distill a robust task reward (addressing reward ambiguity) and to model each strategy's objective (handling heterogeneity). We demonstrate our algorithm can better recover task reward and strategy rewards and imitate the strategies in two simulated tasks and a real-world table tennis task. CCS CONCEPTS• Theory of computation → Inverse reinforcement learning; • Computing methodologies → Learning from demonstrations; Neural networks.
Multi-agent teaming achieves better performance when there is communication among participating agents allowing them to coordinate their actions for maximizing shared utility. However, when collaborating a team of agents with different action and observation spaces, information sharing is not straightforward and requires customized communication protocols, depending on sender and receiver types. Without properly modeling such heterogeneity in agents, communication becomes less helpful and could even deteriorate the multi-agent cooperation performance. We propose heterogeneous graph attention networks, called HetNet, to learn efficient and diverse communication models for coordinating heterogeneous agents towards accomplishing tasks that are of collaborative nature. We propose a Multi-Agent Heterogeneous Actor-Critic (MAHAC) learning paradigm to obtain collaborative per-class policies and effective communication protocols for composite robot teams. Our proposed framework is evaluated against multiple baselines in a complex environment in which agents of different types must communicate and cooperate to satisfy the objectives. Experimental results show that HetNet outperforms the baselines in learning sophisticated multi-agent communication protocols by achieving ∼10% improvements in performance metrics.
The development of human-robot systems able to leverage the strengths of both humans and their robotic counterparts has been greatly sought after because of the foreseen, broad-ranging impact across industry and research. We believe the true potential of these systems cannot be reached unless the robot is able to act with a high level of autonomy, reducing the burden of manual tasking or teleoperation. To achieve this level of autonomy, robots must be able to work fluidly with its human partners, inferring their needs without explicit commands. This inference requires the robot to be able to detect and classify the heterogeneity of its partners. We propose a framework for learning from heterogeneous demonstration based upon Bayesian inference and evaluate a suite of approaches on a real-world dataset of gameplay from StarCraft II. This evaluation provides evidence that our Bayesian approach can outperform conventional methods by up to 12.8%.1
As high-speed, agile robots become more commonplace, these robots will have the potential to better aid and collaborate with humans. However, due to the increased agility and functionality of these robots, close collaboration with humans can create safety concerns that alter team dynamics and degrade task performance. In this work, we aim to enable the deployment of safe and trustworthy agile robots that operate in proximity with humans. We do so by 1) Proposing a novel human-robot doubles table tennis scenario to serve as a testbed for studying agile, proximate human-robot collaboration and 2) Conducting a user-study to understand how attributes of the robot (e.g., robot competency or capacity to communicate) impact team dynamics, perceived safety, and perceived trust, and how these latent factors affect human-robot collaboration (HRC) performance. We find that robot competency significantly increases perceived trust (𝑝 < .001), extending skill-to-trust assessments in prior studies to agile, proximate HRC. Furthermore, interestingly, we find that when the robot vocalizes its intention to perform a task, it results in a significant decrease in team performance (𝑝 = .037) and perceived safety of the system (𝑝 = .009). CCS CONCEPTS• Human-centered computing → Empirical studies in collaborative and social computing; • Computer systems organization → Robotics.
Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce highperforming, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decisiontree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration. 1) We propose Interpretable Continuous Control Trees (IC-CTs), a novel tree-based model that can be optimized via gradient descent with modern RL algorithms to produce high-performance, interpretable continuous control policies. We provide several extensions to prior DDT i x k −b i > 0). As each leaf node is represented as a probability mass function over output classes in prior work, leaf nodes, l, must be modified to produce
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.