Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Xu, Jing; Megan, Ung,; Komeili, Mojtaba; Arora, Kushal; Boureau, Y-Lan; Weston, Jason

doi:10.48550/arxiv.2208.03270

Cited by 3 publications

(16 citation statements)

References 17 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a companion paper (Xu et al, 2022b) a study is conducted of how to improve dialogue models that employ internet-retrieval through the use of human feedback. Obtaining feedback from humans during deployment provides the promise of both improved input distributions that match user's requirements, and corrections to model predictions for those inputs.…”

Section: What's the Best Methods To Learn From Feedback?mentioning

confidence: 99%

“…Generate internet search query We use the WizInt dataset which contains human-authored search queries during crowdsourced dialogue turns to directly train the internet search query generation module in a supervised fashion. We also use the newly collected Feedback on Interactive Talk & Search (FITS) dataset 2 (Xu et al, 2022b) of internet-augmented conversational tasks in a similar manner.…”

Section: Fine-tuningmentioning

confidence: 99%

“…If the F1 overlap is less than 0.5 we drop the example, leaving 281,658 examples out of the original 808,731. For NQ, three different settings are used: with all documents as input, with only the gold document, and with a sampled dialogue history context, fol- Question Answering MS MARCO (Nguyen et al, 2016) SQuAD (Rajpurkar et al, 2016) TriviaQA (Joshi et al, 2017) Natural Questions (Kwiatkowski et al, 2019) Natural Questions (Open) Natural Questions (Open Dialogues) (Adolphs et al, 2021) Knowledge-Grounded Dialogue Wizard of the Internet (Komeili et al, 2022) Wizard of Wikipedia (Dinan et al, 2019b) Funpedia (Dinan et al, 2020b) Open-Domain Dialogue PersonaChat (Zhang et al, 2018) Empathetic Dialogues (Rashkin et al, 2019) Blended Skill Talk (Smith et al, 2020) Multi-Session Chat (Xu et al, 2022a) LIGHT + WILD (Urbanek et al, 2019;Shuster et al, 2021b) Recovery & Feedback SaFeRDialogues (Ung et al, 2022) FITS (Xu et al, 2022b) Task-Oriented Dialogue Google SGD (Rastogi et al, 2020) Taskmaster (Byrne et al, 2019) Taskmaster 2 (Byrne et al, 2019) Taskmaster 3 (Byrne et al, 2019) Table 2: Details of all the training datasets used for fine-tuning the modular tasks.…”

Section: Generate Knowledge Responsementioning

confidence: 99%

“…For goal (i) we have made some initial steps, described in detail in two companion papers (Xu et al, 2022b;Ju et al, 2022). We summarize them briefly here.…”

Section: Continual Learningmentioning

confidence: 99%

“…• We study how to train on human feedback from conversations in order to be better at the skills that people find important, with a full report given in a companion paper (Xu et al, 2022b). We use these findings to help finetune BB3 on a large number of user defined tasks.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

Shuster¹,

Xu²,

Komeili³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a longterm memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (architecture, model and training scheme), and details of its deployment, including safety mechanisms. Human evaluations show its superiority to existing open-domain dialogue agents, including its predecessors Komeili et al., 2022). Finally, we detail our plan for continual learning using the data collected from deployment, which will also be publicly released. The goal of this research program is thus to enable the community to study ever-improving responsible agents that learn through interaction. * * We use the phrase continual learning in the sense of learning that continues over time using data from the model's interactions, but training itself will actually be performed in successive large batches; the model is not updated online.† Equal contribution.

show abstract

Section: What's the Best Methods To Learn From Feedback?mentioning

confidence: 99%

Section: Fine-tuningmentioning

confidence: 99%

Section: Generate Knowledge Responsementioning

confidence: 99%

“…For goal (i) we have made some initial steps, described in detail in two companion papers (Xu et al, 2022b;Ju et al, 2022). We summarize them briefly here.…”

Section: Continual Learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

Shuster¹,

Xu²,

Komeili³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Kirk,

Bean,

Vidgen

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs). However, it is unclear how to collect and incorporate feedback in a way that is efficient, effective and unbiased, especially for highly subjective human preferences and values. In this paper, we survey existing approaches for learning from human feedback, drawing on 95 papers primarily from the ACL and arXiv repositories. First, we summarise the past, pre-LLM trends for integrating human feedback into language models. Second, we give an overview of present techniques and practices, as well as the motivations for using feedback; conceptual frameworks for defining values and preferences; and how feedback is collected and from whom. Finally, we encourage a better future of feedback learning in LLMs by raising five unresolved conceptual and practical challenges.

show abstract

Efficient Latent Variable Modeling for Knowledge-Grounded Dialogue Generation

Han,

Jo,

Nam

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Knowledge-grounded dialogue generation requires to first retrieve appropriate external knowledge based on a conversational context and then generate a response grounded on the retrieved knowledge. In general, these two sequential modules, a knowledge retriever and a response generator, have been separately trained by supervised data for each module. However, obtaining intermediate labels of the ground-truth knowledge is expensive and difficult especially in open-domain conversation. Latent variable modeling can circumvent it and enables a joint training without the knowledge supervision. In this paper, we propose an efficient algorithm for this latent variable modeling that is able to leverage a large amount of dialogue data. In specific, rather than directly training the complex retriever, we adapt a query generator with an off-the-shelf retriever, and the query generator and response generator are simultaneously trained over the latent variable of query. Moreover, we employ the evidence lower bound as a training objective and modify it to efficiently and robustly perform the joint training. Experimental results on diverse knowledge-grounded dialogue datasets show that the proposed algorithm achieves state-ofthe-art performances even without the use of the annotated knowledge while maintaining the efficiency and scalability.

show abstract

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Cited by 3 publications

References 17 publications

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Efficient Latent Variable Modeling for Knowledge-Grounded Dialogue Generation

Contact Info

Product

Resources

About