This article introduces a teacher-student framework for reinforcement learning, synthesising and extending material that appeared in conference proceedings [Torrey, L., & Taylor, M. E. (2013)]. Teaching on a budget: Agents advising agents in reinforcement learning. Proceedings of the international conference on autonomous agents and multiagent systems] and in a non-archival workshop paper [Carboni, N., & Taylor, M. E. (2013, May)]. Preliminary results for 1 vs. 1 tactics in StarCraft. Proceedings of the adaptive and learning agents workshop (at AAMAS-13)]. In this framework, a teacher agent instructs a student agent by suggesting actions the student should take as it learns. However, the teacher may only give such advice a limited number of times. We present several novel algorithms that teachers can use to budget their advice effectively, and we evaluate them in two complex video games: StarCraft and Pac-Man. Our results show that the same amount of advice, given at different moments, can have different effects on student learning, and that teachers can significantly affect student learning even when students use different learning methods and state representations.
Abstract:In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student's intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.
Abstract. In this paper we investigate using multiple mappings for transfer learning in reinforcement learning tasks. We propose two different transfer learning algorithms that are able to manipulate multiple inter-task mappings for both model-learning and model-free reinforcement learning algorithms. Both algorithms incorporate mechanisms to select the appropriate mappings, helping to avoid the phenomenon of negative transfer. The proposed algorithms are evaluated in the Mountain Car and Keepaway domains. Experimental results show that the use of multiple inter-task mappings can significantly boost the performance of transfer learning methodologies, relative to using a single mapping or learning without transfer.
Background Attention deficit hyperactivity disorder (ADHD) is one of the most common neurodevelopmental disorders during childhood; however, the diagnosis procedure remains challenging, as it is nonstandardized, multiparametric, and highly dependent on subjective evaluation of the perceived behavior. Objective To address the challenges of existing procedures for ADHD diagnosis, the ADHD360 project aims to develop a platform for (1) early detection of ADHD by assessing the user’s likelihood of having ADHD characteristics and (2) providing complementary training for ADHD management. Methods A 2-phase nonrandomized controlled pilot study was designed to evaluate the ADHD360 platform, including ADHD and non-ADHD participants aged 7 to 16 years. At the first stage, an initial neuropsychological evaluation along with an interaction with the serious game developed (“Pizza on Time”) for approximately 30-45 minutes is performed. Subsequently, a 2-week behavior monitoring of the participants through the mADHD360 app is planned after a telephone conversation between the participants’ parents and the psychologist, where the existence of any behaviors characteristic of ADHD that affect daily functioning is assessed. Once behavior monitoring is complete, the research team invites the participants to the second stage, where they play the game for a mean duration of 10 weeks (2 times per week). Once the serious game is finished, a second round of behavior monitoring is performed following the same procedures as the initial one. During the study, gameplay data were collected and preprocessed. The protocol of the pilot trials was initially designed for in-person participation, but after the COVID-19 outbreak, it was adjusted for remote participation. State-of-the-art machine learning (ML) algorithms were used to analyze labeled gameplay data aiming to detect discriminative gameplay patterns among the 2 groups (ADHD and non-ADHD) and estimate a player’s likelihood of having ADHD characteristics. A schema including a train-test splitting with a 75:25 split ratio, k-fold cross-validation with k=3, an ML pipeline, and data evaluation were designed. Results A total of 43 participants were recruited for this study, where 18 were diagnosed with ADHD and the remaining 25 were controls. Initial neuropsychological assessment confirmed that the participants in the ADHD group showed a deviation from the participants without ADHD characteristics. A preliminary analysis of collected data consisting of 30 gameplay sessions showed that the trained ML models achieve high performance (ie, accuracy up to 0.85) in correctly predicting the users’ labels (ADHD or non-ADHD) from their gameplay session on the ADHD360 platform. Conclusions ADHD360 is characterized by its notable capacity to discriminate player gameplay behavior as either ADHD or non-ADHD. Therefore, the ADHD360 platform could be a valuable complementary tool for early ADHD detection. Trial Registration ClinicalTrials.gov NCT04362982; https://clinicaltrials.gov/ct2/show/NCT04362982 International Registered Report Identifier (IRRID) RR1-10.2196/40189
The main objective of Transfer Learning is to reuse knowledge acquired in a previous learned task, in order to enhance the learning procedure in a new and more complex task. Transfer learning comprises a suitable solution for speeding up the learning procedure in Reinforcement Learning tasks. This work proposes a novel method for transferring models to Reinforcement Learning agents. The models of the transition and reward functions of a source task, will be transferred to a relevant but different target task. The learning algorithm of the target task's agent takes a hybrid approach, implementing both model-free and model-based learning, in order to fully exploit the presence of a source task model. Moreover, a novel method is proposed for transferring models of potential-based, reward shaping functions. The empirical evaluation, of the proposed approaches, demonstrated significant results and performance improvements in the 3D Mountain Car and Server Job Scheduling tasks, by successfully using the models generated from their corresponding source tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.