Interactive Learning with Corrective Feedback for Policies Based on Deep Neural Networks

Pérez-Dattari, Rodrigo; Celemin, Carlos; Ruiz‐del‐Solar, Javier; Kober, Jens

doi:10.1007/978-3-030-33950-0_31

Cited by 16 publications

(17 citation statements)

References 10 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The traditional means of passing task information to an agent include specifying a reward function (Barto and Sutton 1998) that can be hand-crafted for the task (Singh, Lewis, and Barto 2009;Levine et al 2016;Chebotar et al 2017) and providing demonstrations (Schaal 1999;Abbeel and Ng 2004) before the agent starts training. More recent works explore the concept of the human supervision being provided throughout training by either providing rewards during training (Isbell et al 2001;Thomaz et al 2005;Warnell et al 2018;Perez-Dattari et al 2018) or demonstrations during training; either continuously (Ross, Gordon, and Bagnell 2011b;Kelly et al 2018) or at the agent's discretion (Ross, Gordon, and Bagnell 2011a;Borsa et al 2017;Xu et al 2018;Hester et al 2018;James, Bloesch, and Davison 2018;Yu et al 2018a;Krening 2018;Brown, Cui, and Niekum 2018). In all of these cases, however, the reward and demonstrations are the sole means of interaction.…”

Section: Related Workmentioning

confidence: 99%

Learning to Interactively Learn and Assist

Woodward

Finn

Hausman

2020

AAAI

View full text Add to dashboard Cite

When deploying autonomous agents in the real world, we need effective ways of communicating objectives to them. Traditional skill learning has revolved around reinforcement and imitation learning, each with rigid constraints on the format of information exchanged between the human and the agent. While scalar rewards carry little information, demonstrations require significant effort to provide and may carry more information than is necessary. Furthermore, rewards and demonstrations are often defined and collected before training begins, when the human is most uncertain about what information would help the agent. In contrast, when humans communicate objectives with each other, they make use of a large vocabulary of informative behaviors, including non-verbal communication, and often communicate throughout learning, responding to observed behavior. In this way, humans communicate intent with minimal effort. In this paper, we propose such interactive learning as an alternative to reward or demonstration-driven learning. To accomplish this, we introduce a multi-agent training framework that enables an agent to learn from another agent who knows the current task. Through a series of experiments, we demonstrate the emergence of a variety of interactive learning behaviors, including information-sharing, information-seeking, and question-answering. Most importantly, we find that our approach produces an agent that is capable of learning interactively from a human user, without a set of explicit demonstrations or a reward function, and achieving significantly better performance cooperatively with a human than a human performing the task alone.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning to Interactively Learn and Assist

Woodward

Finn

Hausman

2020

AAAI

View full text Add to dashboard Cite

show abstract

“…The traditional means of passing task information to an agent include specifying a reward function [4,6] that can be hand-crafted for the task [46,30,9] and providing demonstrations [44,1] before the agent starts training. More recent works explore the concept of the human supervision being provided throughout training by either providing rewards during training [51,47,23,39] or demonstrations during training; either continuously [25,42] or at the agent's discretion [53,20,24,55,7,27,41,8]. In all of these cases, however, the reward and demonstrations are the sole means of interaction.…”

Section: Related Workmentioning

confidence: 99%

Learning to Interactively Learn and Assist

Woodward¹,

Finn²,

Hausman³

2019

Preprint

View full text Add to dashboard Cite

When deploying autonomous agents in the real world, we need effective ways of communicating objectives to them. Traditional skill learning has revolved around reinforcement and imitation learning, each with rigid constraints on the format of information exchanged between the human and the agent. While scalar rewards carry little information, demonstrations require significant effort to provide and may carry more information than is necessary. Furthermore, rewards and demonstrations are often defined and collected before training begins, when the human is most uncertain about what information would help the agent. In contrast, when humans communicate objectives with each other, they make use of a large vocabulary of informative behaviors, including non-verbal communication, and often communicate throughout learning, responding to observed behavior. In this way, humans communicate intent with minimal effort. In this paper, we propose such interactive learning as an alternative to reward or demonstrationdriven learning. To accomplish this, we introduce a multi-agent training framework that enables an agent to learn from another agent who knows the current task. Through a series of experiments, we demonstrate the emergence of a variety of interactive learning behaviors, including information-sharing, information-seeking, and question-answering. Most importantly, we find that our approach produces an agent that is capable of learning interactively from a human user, without a set of explicit demonstrations or a reward function, and achieving significantly better performance cooperatively with a human than a human performing the task alone. * Work done as a part of the Goolge AI Residency program.Preprint. Under review.

show abstract

“…Khaled Karim and Hossein (2019) explained that there are four categories in corrective feedback analysis: clarification request, Recast, Elicitation, and Metalinguistic Feedback. Through corrective feedback, students realize where their mistakes are and deepen their understanding of the knowledge gained through learning experiences so that learning difficulties can be overcome and ultimately, the quality of learning outcomes will be better (Celemin, & Ruiz-del-Solar, 2019;Pérez-Dattari et al, 2018;Chen et al, 2018). Corrective feedback is a lecturer's response to student learning errors.…”

Section: Introductionmentioning

confidence: 99%

Corrective Feedback in Learning Interaction: Integration of Surface Strategy Taxonomy

Sari

Septiyana

Suhono

et al. 2021

AIJP

View full text Add to dashboard Cite

The article aimed to determine the types of errors found in classroom learning interactions at (Perguruan Tinggi Keagamaan Islam) PTKI Metro, to analyze the strategies used in correcting student errors in classroom learning interactions at PTKI Metro and to know the aspects of Surface Strategy Taxonomy which was found in classroom learning interaction errors at PTKI Metro. In analyzing the data, the researchers used the theory of Dalton-Puffer (2007) which was used to find out and describe the types of common student errors in the interaction of learning English in the classroom. Then, using the theory offered by Mendez at al (2010) which is applied to analyze the types of lecturer strategies in correcting student errors in learning. The researcher also analyzed the linguistic aspects of the taxonomy category in the student's errors using the theory of Dulay, Burth, and Krhashen (1982). The results show that the corrective feedback strategies used by lecturers at PTKI Metro City were Explicit Correction, Recast, Clarification Request, and Metalinguistic Feedback. And this study also classifies the types of errors based on the Aspects Surface Strategy Taxonomy on learning interactions in the classroom.

show abstract

Interactive Learning with Corrective Feedback for Policies Based on Deep Neural Networks

Cited by 16 publications

References 10 publications

Learning to Interactively Learn and Assist

Learning to Interactively Learn and Assist

Learning to Interactively Learn and Assist

Corrective Feedback in Learning Interaction: Integration of Surface Strategy Taxonomy

Contact Info

Product

Resources

About