Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

Hancock, Braden; Bordes, Antoine; Mazaré, Pierre-Emmanuel; Weston, Jason

doi:10.48550/arxiv.1901.05415

Cited by 25 publications

(43 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also build on the growing body of work that fine-tunes models with human feedback. This has been applied in many domains including summarization (Böhm et al, 2019;Ziegler et al, 2019;Stiennon et al, 2020), dialogue (Jaques et al, 2019;Yi et al, 2019;Hancock et al, 2019), translation (Kreutzer et al, 2018;Bahdanau et al, 2016), semantic parsing (Lawrence and Riezler, 2018), story generation (Zhou and Xu, 2020), review generation (Cho et al, 2018), and evidence extraction (Perez et al, 2019), and agents in simulated environments (Christiano et al, 2017;Ibarz et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

Recursively Summarizing Books with Human Feedback

Wu,

Ouyang,

Ziegler

et al. 2021

Preprint

View full text Add to dashboard Cite

A major challenge for scaling machine learning is training models to perform tasks that are very difficult or time-consuming for humans to evaluate. We present progress on this problem on the task of abstractive summarization of entire fiction novels. Our method combines learning from human feedback with recursive task decomposition: we use models trained on smaller parts of the task to assist humans in giving feedback on the broader task. We collect a large volume of demonstrations and comparisons from human labelers, and fine-tune GPT-3 using behavioral cloning and reward modeling to do summarization recursively. At inference time, the model first summarizes small sections of the book and then recursively summarizes these summaries to produce a summary of the entire book. Our human labelers are able to supervise and evaluate the models quickly, despite not having read the entire books themselves. Our resulting model generates sensible summaries of entire books, even matching the quality of human-written summaries in a few cases (∼ 5% of books). We achieve state-of-the-art results on the recent BookSum dataset for book-length summarization. A zero-shot question-answering model using these summaries achieves competitive results on the challenging NarrativeQA benchmark for answering questions about books and movie scripts. We release datasets of samples from our model. 2 * This was a joint project of the OpenAI Alignment team. JW and LO contributed equally. DMZ, NS, and RL were full-time contributors for most of the duration. JL and PC managed the team.

show abstract

Section: Related Workmentioning

confidence: 99%

Recursively Summarizing Books with Human Feedback

Wu,

Ouyang,

Ziegler

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…For example, Liu et al (2018) collect dialogue corrections from users during deployment, while Li et al (2017) collect both binary explicit feedback and implicit natural language feedback. Also, Hancock et al (2019) propose a lifetime learning framework to improve chatbot performance. The chatbot is trained not only to generate dialogues but also to predict user satisfactions.…”

Section: Dialogue and Question Answeringmentioning

confidence: 99%

“…For example, for text classification, HITL improves classification accuracy (Smith et al, 2018;Jandot et al, 2016). Similarly, dialogue and question answering systems have higher ranking metric hits after adapting a HITL approach (Hancock et al, 2019;Brown et al, 2020). Researchers also find HITL improves model's robustness and generalization on different data (Stiennon et al, 2020;Jandot et al, 2016).…”

Section: Dialogue and Question Answeringmentioning

confidence: 99%

“…A natural language interface is an interface where the user interacts with the computer through natural language. As this interface usually simulates having a conversation with a computer, it mostly comes with the purpose of building up a dialogue system (Hancock et al, 2019;Liu et al, 2018;Li et al, 2017). The natural language interface not only supports users to provide explicit feedback (Liu et al, 2018;Li et al, 2017), such as positive or negative responses.…”

Section: Natural Language Interfacementioning

confidence: 99%

“…Just like traditional NLP frameworks, there is a high-dimensional design space for HITL NLP systems. For example, human feedback can come from end users (Li et al, 2017) or crowd workers (Wallace et al, 2019), and human can intervene models during training (Stiennon et al, 2020) or deployment (Hancock et al, 2019). Good HITL NLP systems need to clearly communicates to humans of what the model needs, provide intuitive interfaces to collect feedback, and effectively learn from them.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Putting Humans in the Natural Language Processing Loop: A Survey

Wang¹,

Choi²,

Xu³

et al. 2021

Preprint

View full text Add to dashboard Cite

How can we design Natural Language Processing (NLP) systems that learn from human feedback? There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself. HITL NLP research is nascent but multifarious-solving various NLP problems, collecting diverse feedback from different people, and applying different methods to learn from collected feedback. We present a survey of HITL NLP work from both Machine Learning (ML) and Human-Computer Interaction (HCI) communities that highlights its short yet inspiring history, and thoroughly summarize recent frameworks focusing on their tasks, goals, human interactions, and feedback learning methods. Finally, we discuss future directions for integrating human feedback in the NLP development loop.

show abstract

Simulating the Effects of Social Presence on Trust, Privacy Concerns & Usage Intentions in Automated Bots for Finance

Coopamootoo

Toreini

et al. 2020

2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

View full text Add to dashboard Cite

FinBots are chatbots built on automated decision technology, aimed to facilitate accessible banking and to support customers in making financial decisions. Chatbots are increasing in prevalence, sometimes even equipped to mimic human social rules, expectations and norms, decreasing the option for human-to-human interaction. As banks and financial advisory platforms move towards creating bots that enhance the current state of consumer trust and adoption rates, we investigated the effects of chatbot vignettes with and without socio-emotional features on intention to use the chatbot for financial support purposes. We conducted a between-subject online experiment with N = 410 participants. Participants in the control group were provided with a vignette describing a secure and reliable chatbot called XRO23, whereas participants in the experimental group were presented with a vignette describing a secure and reliable chatbot that is more human-like and named Emma. We found that Vignette Emma did not increase participants' trust levels nor lowered their privacy concerns even though it increased perception of social presence. However, we found that intention to use the presented chatbot for financial support was positively influenced by perceived humanness and trust in the bot. Participants were also more willing to share financially-sensitive information such as account number, sort code and payments information to XRO23 compared to Emma -revealing a preference for a technical and mechanical FinBot in information sharing. Overall, this research contributes to our understanding of the intention to use chatbots with different features as financial technology, in particular that socio-emotional support may not be favoured when designed separately from financial function.

show abstract

Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

Cited by 25 publications

References 0 publications

Recursively Summarizing Books with Human Feedback

Recursively Summarizing Books with Human Feedback

Putting Humans in the Natural Language Processing Loop: A Survey

Simulating the Effects of Social Presence on Trust, Privacy Concerns & Usage Intentions in Automated Bots for Finance

Contact Info

Product

Resources

About