2020
DOI: 10.1007/978-3-030-33950-0_31
|View full text |Cite
|
Sign up to set email alerts
|

Interactive Learning with Corrective Feedback for Policies Based on Deep Neural Networks

Abstract: Deep Reinforcement Learning (DRL) has become a powerful strategy to solve complex decision making problems based on Deep Neural Networks (DNNs). However, it is highly data demanding, so unfeasible in physical systems for most applications. In this work, we approach an alternative Interactive Machine Learning (IML) strategy for training DNN policies based on human corrective feedback, with a method called Deep COACH (D-COACH). This approach not only takes advantage of the knowledge and insights of human teacher… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 16 publications
(17 citation statements)
references
References 10 publications
(15 reference statements)
0
17
0
Order By: Relevance
“…The traditional means of passing task information to an agent include specifying a reward function (Barto and Sutton 1998) that can be hand-crafted for the task (Singh, Lewis, and Barto 2009;Levine et al 2016;Chebotar et al 2017) and providing demonstrations (Schaal 1999;Abbeel and Ng 2004) before the agent starts training. More recent works explore the concept of the human supervision being provided throughout training by either providing rewards during training (Isbell et al 2001;Thomaz et al 2005;Warnell et al 2018;Perez-Dattari et al 2018) or demonstrations during training; either continuously (Ross, Gordon, and Bagnell 2011b;Kelly et al 2018) or at the agent's discretion (Ross, Gordon, and Bagnell 2011a;Borsa et al 2017;Xu et al 2018;Hester et al 2018;James, Bloesch, and Davison 2018;Yu et al 2018a;Krening 2018;Brown, Cui, and Niekum 2018). In all of these cases, however, the reward and demonstrations are the sole means of interaction.…”
Section: Related Workmentioning
confidence: 99%
“…The traditional means of passing task information to an agent include specifying a reward function (Barto and Sutton 1998) that can be hand-crafted for the task (Singh, Lewis, and Barto 2009;Levine et al 2016;Chebotar et al 2017) and providing demonstrations (Schaal 1999;Abbeel and Ng 2004) before the agent starts training. More recent works explore the concept of the human supervision being provided throughout training by either providing rewards during training (Isbell et al 2001;Thomaz et al 2005;Warnell et al 2018;Perez-Dattari et al 2018) or demonstrations during training; either continuously (Ross, Gordon, and Bagnell 2011b;Kelly et al 2018) or at the agent's discretion (Ross, Gordon, and Bagnell 2011a;Borsa et al 2017;Xu et al 2018;Hester et al 2018;James, Bloesch, and Davison 2018;Yu et al 2018a;Krening 2018;Brown, Cui, and Niekum 2018). In all of these cases, however, the reward and demonstrations are the sole means of interaction.…”
Section: Related Workmentioning
confidence: 99%
“…The traditional means of passing task information to an agent include specifying a reward function [4,6] that can be hand-crafted for the task [46,30,9] and providing demonstrations [44,1] before the agent starts training. More recent works explore the concept of the human supervision being provided throughout training by either providing rewards during training [51,47,23,39] or demonstrations during training; either continuously [25,42] or at the agent's discretion [53,20,24,55,7,27,41,8]. In all of these cases, however, the reward and demonstrations are the sole means of interaction.…”
Section: Related Workmentioning
confidence: 99%
“…Khaled Karim and Hossein (2019) explained that there are four categories in corrective feedback analysis: clarification request, Recast, Elicitation, and Metalinguistic Feedback. Through corrective feedback, students realize where their mistakes are and deepen their understanding of the knowledge gained through learning experiences so that learning difficulties can be overcome and ultimately, the quality of learning outcomes will be better (Celemin, & Ruiz-del-Solar, 2019;Pérez-Dattari et al, 2018;Chen et al, 2018). Corrective feedback is a lecturer's response to student learning errors.…”
Section: Introductionmentioning
confidence: 99%