BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

Song, Haoyu; Wang, Yan; Zhang, Kaiyan; Zhang, Weinan; Liu, Ting

doi:10.18653/v1/2021.acl-long.14

Cited by 58 publications

(39 citation statements)

References 27 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another solution could be to combine a data-driven model with another approach to compensate for the deficiencies in the models, such as combining a generative model (e.g., Sequence-to-Sequence) with a Memory Network (Madotto et al, 2018;Zhang B. et al, 2020) or with transformers (Vaswani et al, 2017), such as in the work of Roller et al (2020), Generative Pre-trained Transformer (GPT) (Radford et al, 2018(Radford et al, , 2019Brown et al, 2020;Zhang Y. et al, 2020), Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2019;Song et al, 2021), and Poly-encoders (Humeau et al, 2020;Li et al, 2020). Data-driven models can also be combined with graphical models (Zhou et al, 2020;Song et al, 2019;Moon et al, 2019;Shi et al, 2020;Wu B. et al, 2020;Xu et al, 2020), rule-based or slot-filling systems (Tammewar et al, 2018;Zhang Z. et al, 2019), a knowledge-base (Ganhotra and Polymenakos, 2018;Ghazvininejad et al, 2018;Luo et al, 2019;Yavuz et al, 2019;Moon et al, 2019;Wu et al, 2019;Lian et al, 2019;Zhang B. et al, 2020;Majumder et al, 2020;Tuan et al, 2021) or with automatic extraction of attributes from dialogue (Tigunova et al, 2019(Tigunova et al, , 2020Wu C.-S. et al, 2020Wu C.-S. et al, , 2021Ma et al, 2021) to improve the personalised entity selection in responses.…”

Section: Discussionmentioning

confidence: 99%

Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions

Irfan

Hellou²,

Belpaeme

2021

Front. Robot. AI

View full text Add to dashboard Cite

While earlier research in human-robot interaction pre-dominantly uses rule-based architectures for natural language interaction, these approaches are not flexible enough for long-term interactions in the real world due to the large variation in user utterances. In contrast, data-driven approaches map the user input to the agent output directly, hence, provide more flexibility with these variations without requiring any set of rules. However, data-driven approaches are generally applied to single dialogue exchanges with a user and do not build up a memory over long-term conversation with different users, whereas long-term interactions require remembering users and their preferences incrementally and continuously and recalling previous interactions with users to adapt and personalise the interactions, known as the lifelong learning problem. In addition, it is desirable to learn user preferences from a few samples of interactions (i.e., few-shot learning). These are known to be challenging problems in machine learning, while they are trivial for rule-based approaches, creating a trade-off between flexibility and robustness. Correspondingly, in this work, we present the text-based Barista Datasets generated to evaluate the potential of data-driven approaches in generic and personalised long-term human-robot interactions with simulated real-world problems, such as recognition errors, incorrect recalls and changes to the user preferences. Based on these datasets, we explore the performance and the underlying inaccuracies of the state-of-the-art data-driven dialogue models that are strong baselines in other domains of personalisation in single interactions, namely Supervised Embeddings, Sequence-to-Sequence, End-to-End Memory Network, Key-Value Memory Network, and Generative Profile Memory Network. The experiments show that while data-driven approaches are suitable for generic task-oriented dialogue and real-time interactions, no model performs sufficiently well to be deployed in personalised long-term interactions in the real world, because of their inability to learn and use new identities, and their poor performance in recalling user-related data.

show abstract

Section: Discussionmentioning

confidence: 99%

Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions

Irfan

Hellou²,

Belpaeme

2021

Front. Robot. AI

View full text Add to dashboard Cite

show abstract

“…In this method, a big model is used to initialize an encoder and decoder, and personal attribute embeddings are devised to model richer dialogue contexts by encoding speakers' personas together with dialogue histories. Song et al [1445] propose a Bert-over-Bert architecture with two Bert encoders, where one decoder is for response generation, and another is for persona information understanding.…”

Section: Persona In Conversationmentioning

confidence: 99%

“…Clearly, this problem will significantly reduce the attractiveness of dialogue systems and has attracted researchers' interest. Song et al [1445] introduced conversational natural language inference data and models to solve this problem. Through there have been some attempts to address this issue, it is still a challenging and emerging area.…”

Section: Frontier Trendsmentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

View full text Add to dashboard Cite

domains indexed by Google News. It contains 31 million documents with an average length of 793 BPE tokens. Like C4, it excludes examples with duplicate URLs. News dumps from December 2016 through March 2019 were used as training data, articles published in April 2019 from the April 2019 dump were used for evaluation. OpenWebText2(OWT2). OWT2 is an enhanced version of the original OpenWebTextCorpus, including content from multiple languages, document metadata, multiple dataset versions, and open source replication code, covering all Reddit submissions from 2005 up until April 2020. PubMed Central(PMC). PMC is a free full-text archive of biomedical and life sciences journal literature from the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The dataset is updated daily. In addition to full-text articles, they contain corrections, retractions, and expressions of concern, as well as file lists that include metadata for articles in each dataset.PMC obtained by open registration in Amazon Web Services (AWS) includes The PMC Open Access Subset and The Author Manuscript Dataset. The PMC Open Access Subset includes all articles and preprints in PMC with a machine-readable Creative Commons license that allows reuse. The Author Manuscript Dataset includes accepted author manuscripts collected under a funder policy in PMC and made available in machine-readable formats for text mining. ArXiv. ArXiv is a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more. It provides open access to academic articles, covering many subdisciplines from vast branches of physics to computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics, which is helpful to the potential downstream applications of the research field. In addition, the writing language of LaTeX also contributes to the study of language models. Colossal Clean Crawled Corpus(C4). C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It is based on Common Crawl dataset and was used to train the T5 text-to-text Transformer models. The cleaned English version of C4 has 364,868,901 training examples and 364,608 validation examples, while the uncleaned English version has 1,063,805,324 training examples and 1,065,029 validation examples; the realnewslike version has 13,799,838 training examples and 13,863 validation examples, while the webtextlike version has 4,500,788 training examples and 4,493 validation examples. Wiki-40B. Wikipedia (Wiki-40B) is a clean-up text collection containing more than 40 Wikipedia language editions of pages corresponding to entities. The dataset is split into train/validation/test sets for each language. The training set has 2,926,536 examples, the validation set has 163,597 examples, and the test set has 162,274 examples. Wiki-40B is cleaned by a page filter to remove ambiguous, redirected, deleted, and non-physical pages. CLUECorpus2020. CLUECorpus2020 ...

show abstract

“…Refine mechanism has been proved to be a effective and compelling technique in both natural language understanding and generation tasks (Zhang et al, 2019a;Wu et al, 2020b;Song et al, 2021). For natural language understanding, Wu et al (2020b) design a novel two-pass iteration mechanism to handle the uncoordinated slots problem caused by conditional independence of non-autoregressive model, in which the model utilizes B-label output from first phase as input at second phase.…”

Section: Refine Mechanismmentioning

confidence: 99%

“…For natural language generation, Zhang et al (2019a) use refine mechanism to generate refined summary which firstly applies BERT as decoder. Recently, a novel BERT-over-BERT (BoB) model is proposed to solve response generation task and consistency understanding simultaneously (Song et al, 2021). In this paper, we utilize topicRefine framwork to build a topic-aware multi-turn end-to-end dialogue system, aiming to generate informative and topic-related dialogue response.…”

Section: Refine Mechanismmentioning

confidence: 99%

TopicRefine: Joint Topic Prediction and Dialogue Response Generation for Multi-turn End-to-End Dialogue System

Wang,

Cui,

Zhou

et al. 2021

Preprint

View full text Add to dashboard Cite

A multi-turn dialogue always follows a specific topic thread, and topic shift at the discourse level occurs naturally as the conversation progresses, necessitating the model's ability to capture different topics and generate topic-aware responses. Previous research has either predicted the topic first and then generated the relevant response, or simply applied the attention mechanism to all topics, ignoring the joint distribution of the topic prediction and response generation models and resulting in uncontrollable and unrelated responses. In this paper, we propose a joint framework with a topic refinement mechanism to learn these two tasks simultaneously. Specifically, we design a three-pass iteration mechanism to generate coarse response first, then predict corresponding topics, and finally generate refined response conditioned on predicted topics. Moreover, we utilize GPT2DoubleHeads and BERT for the topic prediction task respectively, aiming to investigate the effects of joint learning and the understanding ability of GPT model. Experimental results demonstrate that our proposed framework achieves new state-of-the-art performance at response generation task and the great potential understanding capability of GPT model.

show abstract

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

Cited by 58 publications

References 27 publications

Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions

Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions

A Roadmap for Big Model

TopicRefine: Joint Topic Prediction and Dialogue Response Generation for Multi-turn End-to-End Dialogue System

Contact Info

Product

Resources

About