2021
DOI: 10.1162/tacl_a_00428
|View full text |Cite
|
Sign up to set email alerts
|

Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

Abstract: We study continual learning for natural language instruction generation, by observing human users’ instruction execution. We focus on a collaborative scenario, where the system both acts and delegates tasks to human users using natural language. We compare user execution of generated instructions to the original system intent as an indication to the system’s success communicating its intent. We show how to use this signal to improve the system’s ability to generate instructions via contextual bandit learning. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…Beyond the output itself, there are a few concerns over the generation of the models of ChatGPT. Because the mechanisms and intricacies of these algorithms have been extensively detailed in other resources, 26-31 we will not detail these here. However, we will expand on the process of supervised learning in construction of this model.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Beyond the output itself, there are a few concerns over the generation of the models of ChatGPT. Because the mechanisms and intricacies of these algorithms have been extensively detailed in other resources, 26-31 we will not detail these here. However, we will expand on the process of supervised learning in construction of this model.…”
Section: Discussionmentioning
confidence: 99%
“…15 The mean DISCERN score across 40 ChatGPT responses was 44.2 ± 7.4, indicating that the material was overall of "fair" quality. Three responses (7.5%) were found to be of "poor" quality (DISCERN score [27][28][29][30][31][32][33][34][35][36][37][38], 34 responses (68.0%) were found to be of "fair" quality (DISCERN score 39-50), and 3 responses (7.5%) were found to be of "good" quality (DISCERN score 51-62). There were no responses found to be of "excellent" or "very poor" quality.…”
Section: Response Qualitymentioning
confidence: 99%
“…For example, if the language generated by a speaker, although correct, is difficult to understand, this calls for unnecessary interpretation effort from the other agent. To measure whether pragmatic agents enable efficient communication, evaluations can use metrics of communicative cost (Walker et al, 1997) such as time to task completion, utterance length and complexity (Effenberger et al, 2021), measures such as lexical entrainment (Clark and Wilkes-Gibbs, 1986;Parent and Eskenazi, 2010;Hawkins et al, 2020), and quality ratings (Kojima et al, 2021).…”
Section: Evaluating Pragmatic Modelsmentioning
confidence: 99%
“…A followup Kojima et al ( 2021) is especially relevant; in that work they show how multiple rounds of learning can continue to improve the language generation capabilities of a "leader" model. In addition to the embodied agents and players, our work shares with Kojima et al (2021) multiple rounds of data collection and the use of player feedback after "execution" to label examples. However, the key difference is that in Kojima et al (2021) the agent is a single ML model, whereas in this work, we aim to show that credit can be assigned to different components in a modular system, the data for the component can be annotated, and the component re-trained without any engineer intervention.…”
Section: Related Workmentioning
confidence: 99%