BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

Shuster, Kurt; Xu, Jing; Komeili, Mojtaba; Ju, Da Young; Smith, Eric M.; Roller, Stephen; Megan, Ung,; Chen, Moya; Arora, Kushal; Lane, Joshua E.; Behrooz, Morteza; Ngan, William; Poff, Spencer; Goyal, Naman; Szlam, Arthur; Boureau, Y-Lan; Kambadur, Melanie; Weston, Jason

doi:10.48550/arxiv.2208.03188

Cited by 18 publications

(28 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently several language models, such as Blenderbot [18] Lamda [20], and ChatGPT [13] have been introduced that are specifically tuned for dialog applications, but achieving conversational interaction can be achieved via prompt engineering with general purpose large language models as well. Valvoda et al found that fine-tuning a large language model for dialog resulted in duller and more repetitive output, while generating dynamic prompts resulted in more novel and diverse responses [21].…”

Section: Related Workmentioning

confidence: 99%

A Case Study in Engineering a Conversational Programming Assistant's Persona

Ross¹,

Müller²,

Martínez³

et al. 2023

Preprint

View full text Add to dashboard Cite

The Programmer's Assistant is an experimental prototype software development environment that integrates a chatbot with a code editor. Conversational capability was achieved by using an existing code-fluent Large Language Model and providing it with a prompt that establishes a conversational interaction pattern, a set of conventions, and a style of interaction appropriate for the application. A discussion of the evolution of the prompt provides a case study in how to coax an existing foundation model to behave in a desirable manner for a particular application.CCS Concepts: • Human-centered computing → HCI theory, concepts and models; • Software and its engineering → Designing software; • Computing methodologies → Generative and developmental approaches.

show abstract

Section: Related Workmentioning

confidence: 99%

A Case Study in Engineering a Conversational Programming Assistant's Persona

Ross¹,

Müller²,

Martínez³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…• OPT-175B (Zhang et al, 2022) and BB3-175B (Shuster et al, 2022b): we compare the 175B language model OPT (either 0-shot or few-shot 2 ) with BlenderBot 3, which is fine-tuned with conversational datasets including modular supervision, and internetaugmentation, from our task. This setting examines if our experiments and results are applicable to very large language models.…”

Section: Deployed Modelsmentioning

confidence: 99%

“…Very large models benefit from feedback from smaller models OPT-175B, either in zero-shot or few-shot variants is only pre-trained on dialogue data, and not fine-tuned on our task, and performs reasonably -but not better than smaller models that are fine-tuned. BlenderBot 3 (Shuster et al, 2022b) is trained with the modular supervision feedback data collected from the smaller (3B parameter) models, in addition to fine-tuning on other standard dialogue datasets. This model provides the best human evaluation metrics of all the systems we test, with a good response rate of 64.8% and a rating of 4.08.…”

Section: Modular Superior To Non-modular Feedbackmentioning

confidence: 99%

See 1 more Smart Citation

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Xu¹,

Megan²,

Komeili³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Frozen models trained to mimic static datasets can never improve their performance. Models that can employ internet-retrieval for up-todate information and obtain feedback from humans during deployment provide the promise of both adapting to new information, and improving their performance. In this work we study how to improve internet-driven conversational skills in such a learning framework. We collect deployment data, which we make publicly available, of human interactions, and collect various types of human feedback -including binary quality measurements, free-form text feedback, and fine-grained reasons for failure. We then study various algorithms for improving from such feedback, including standard supervised learning, rejection sampling, model-guiding and reward-based learning, in order to make recommendations on which type of feedback and algorithms work best. We find the recently introduced DIRECTOR model (Arora et al., 2022) shows significant improvements over other existing approaches.

show abstract

“…Our goal here was to learn a direct mapping from a intent-attribute-value input text to a human-readable Portuguese text. We tested four text-to-text transformer-based models: Bart [Lewis et al 2019], T5 [Xue et al 2020], Blenderbot [Shuster et al 2022] and GPT2 [Radford et al 2019]. While GPT2 utilizes a decoder only module, the first three models use a encoder-decoder scheme [Cho et al 2014].…”

Section: End-to-end Architecturementioning

confidence: 99%

Comparing Computational Architectures for Automated Journalism

Sym

Campos

José

et al. 2022

Anais Do XIX Encontro Nacional De Inteligência Artificial E Computacional (ENIAC 2022)

View full text Add to dashboard Cite

The majority of NLG systems have been designed following either a template-based or a pipeline-based architecture. Recent neural models for datato-text generation have been proposed with an end-to-end deep learning flavor, which handles non-linguistic input in natural language without explicit intermediary representations. This study compares the most often employed methods for generating Brazilian Portuguese texts from structured data. Results suggest that explicit intermediate steps in the generation process produce better texts than the ones generated by neural end-to-end architectures, avoiding data hallucination while better generalizing to unseen inputs. Code and corpus are publicly available.

show abstract

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

Cited by 18 publications

References 56 publications

A Case Study in Engineering a Conversational Programming Assistant's Persona

A Case Study in Engineering a Conversational Programming Assistant's Persona

Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback

Comparing Computational Architectures for Automated Journalism

Contact Info

Product

Resources

About