Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.14
|View full text |Cite
|
Sign up to set email alerts
|

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

Abstract: Maintaining consistent personas is essential for dialogue agents. Although tremendous advancements have been brought, the limitedscale of annotated persona-dense data are still barriers towards training robust and consistent persona-based dialogue models. In this work, we show how the challenges can be addressed by disentangling persona-based dialogue generation into two sub-tasks with a novel BERTover-BERT (BoB) model. Specifically, the model consists of a BERT-based encoder and two BERT-based decoders, where… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 58 publications
(39 citation statements)
references
References 27 publications
(32 reference statements)
0
21
0
Order By: Relevance
“…Another solution could be to combine a data-driven model with another approach to compensate for the deficiencies in the models, such as combining a generative model (e.g., Sequence-to-Sequence) with a Memory Network (Madotto et al, 2018;Zhang B. et al, 2020) or with transformers (Vaswani et al, 2017), such as in the work of Roller et al (2020), Generative Pre-trained Transformer (GPT) (Radford et al, 2018(Radford et al, , 2019Brown et al, 2020;Zhang Y. et al, 2020), Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2019;Song et al, 2021), and Poly-encoders (Humeau et al, 2020;Li et al, 2020). Data-driven models can also be combined with graphical models (Zhou et al, 2020;Song et al, 2019;Moon et al, 2019;Shi et al, 2020;Wu B. et al, 2020;Xu et al, 2020), rule-based or slot-filling systems (Tammewar et al, 2018;Zhang Z. et al, 2019), a knowledge-base (Ganhotra and Polymenakos, 2018;Ghazvininejad et al, 2018;Luo et al, 2019;Yavuz et al, 2019;Moon et al, 2019;Wu et al, 2019;Lian et al, 2019;Zhang B. et al, 2020;Majumder et al, 2020;Tuan et al, 2021) or with automatic extraction of attributes from dialogue (Tigunova et al, 2019(Tigunova et al, , 2020Wu C.-S. et al, 2020Wu C.-S. et al, , 2021Ma et al, 2021) to improve the personalised entity selection in responses.…”
Section: Discussionmentioning
confidence: 99%
“…Another solution could be to combine a data-driven model with another approach to compensate for the deficiencies in the models, such as combining a generative model (e.g., Sequence-to-Sequence) with a Memory Network (Madotto et al, 2018;Zhang B. et al, 2020) or with transformers (Vaswani et al, 2017), such as in the work of Roller et al (2020), Generative Pre-trained Transformer (GPT) (Radford et al, 2018(Radford et al, , 2019Brown et al, 2020;Zhang Y. et al, 2020), Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2019;Song et al, 2021), and Poly-encoders (Humeau et al, 2020;Li et al, 2020). Data-driven models can also be combined with graphical models (Zhou et al, 2020;Song et al, 2019;Moon et al, 2019;Shi et al, 2020;Wu B. et al, 2020;Xu et al, 2020), rule-based or slot-filling systems (Tammewar et al, 2018;Zhang Z. et al, 2019), a knowledge-base (Ganhotra and Polymenakos, 2018;Ghazvininejad et al, 2018;Luo et al, 2019;Yavuz et al, 2019;Moon et al, 2019;Wu et al, 2019;Lian et al, 2019;Zhang B. et al, 2020;Majumder et al, 2020;Tuan et al, 2021) or with automatic extraction of attributes from dialogue (Tigunova et al, 2019(Tigunova et al, , 2020Wu C.-S. et al, 2020Wu C.-S. et al, , 2021Ma et al, 2021) to improve the personalised entity selection in responses.…”
Section: Discussionmentioning
confidence: 99%
“…In this method, a big model is used to initialize an encoder and decoder, and personal attribute embeddings are devised to model richer dialogue contexts by encoding speakers' personas together with dialogue histories. Song et al [1445] propose a Bert-over-Bert architecture with two Bert encoders, where one decoder is for response generation, and another is for persona information understanding.…”
Section: Persona In Conversationmentioning
confidence: 99%
“…Clearly, this problem will significantly reduce the attractiveness of dialogue systems and has attracted researchers' interest. Song et al [1445] introduced conversational natural language inference data and models to solve this problem. Through there have been some attempts to address this issue, it is still a challenging and emerging area.…”
Section: Frontier Trendsmentioning
confidence: 99%
“…Refine mechanism has been proved to be a effective and compelling technique in both natural language understanding and generation tasks (Zhang et al, 2019a;Wu et al, 2020b;Song et al, 2021). For natural language understanding, Wu et al (2020b) design a novel two-pass iteration mechanism to handle the uncoordinated slots problem caused by conditional independence of non-autoregressive model, in which the model utilizes B-label output from first phase as input at second phase.…”
Section: Refine Mechanismmentioning
confidence: 99%
“…For natural language generation, Zhang et al (2019a) use refine mechanism to generate refined summary which firstly applies BERT as decoder. Recently, a novel BERT-over-BERT (BoB) model is proposed to solve response generation task and consistency understanding simultaneously (Song et al, 2021). In this paper, we utilize topicRefine framwork to build a topic-aware multi-turn end-to-end dialogue system, aiming to generate informative and topic-related dialogue response.…”
Section: Refine Mechanismmentioning
confidence: 99%