Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Song, Yiping; Liu, Zequn; Bi, Wei; Yan, Rui; Zhang, Ming

doi:10.18653/v1/2020.acl-main.517

Cited by 28 publications

(19 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data-driven models can also be combined with graphical models ( Zhou et al, 2020 ; Song et al, 2019 ; Moon et al, 2019 ; Shi et al, 2020 ; Wu B. et al, 2020 ; Xu et al, 2020 ), rule-based or slot-filling systems ( Tammewar et al, 2018 ; Zhang Z. et al, 2019 ), a knowledge-base ( Ganhotra and Polymenakos, 2018 ; Ghazvininejad et al, 2018 ; Luo et al, 2019 ; Yavuz et al, 2019 ; Moon et al, 2019 ; Wu et al, 2019 ; Lian et al, 2019 ; Zhang B. et al, 2020 ; Majumder et al, 2020 ; Tuan et al, 2021 ) or with automatic extraction of attributes from dialogue ( Tigunova et al, 2019 , 2020 ; Wu C.-S. et al, 2020 , 2021 ; Ma et al, 2021 ) to improve the personalised entity selection in responses. Methods that adopt transfer learning ( Genevay and Laroche, 2016 ; Lopez-Paz and Ranzato, 2017 ; Mo et al, 2017 , 2018 ; Yang et al, 2017 , 2018 ; Wolf et al, 2019 ; Golovanov et al, 2020 ), meta-learning ( Finn et al, 2017 ; Santoro et al, 2016 ; Vinyals et al, 2016 ; Munkhdalai and Yu, 2017 ; Madotto et al, 2019 ; Zhang W.-N. et al, 2019 ; Song et al, 2020 ; Tian et al, 2021 ) and key-value memory structures ( Xu et al, 2017 ; Kaiser et al, 2017 ; Zhu and Yang, 2018 , 2020 ; de Masson d’Autume et al, 2019 ) could provide effective insights to alleviate data scarcity and enable quick adaption to various users through improving few-shot and lifelong learning capabilities of the dialogue models ( Wang et al, 2020b ).…”

Section: Discussionmentioning

confidence: 99%

“…Another solution could be to combine a data-driven model with another approach to compensate for the deficiencies in the models, such as combining a generative model (e.g., Sequence-to-Sequence) with a Memory Network (Madotto et al, 2018;Zhang B. et al, 2020) or with transformers (Vaswani et al, 2017), such as in the work of Roller et al (2020), Generative Pre-trained Transformer (GPT) (Radford et al, 2018(Radford et al, , 2019Brown et al, 2020;Zhang Y. et al, 2020), Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2019;Song et al, 2021), and Poly-encoders (Humeau et al, 2020;Li et al, 2020). Data-driven models can also be combined with graphical models (Zhou et al, 2020;Song et al, 2019;Moon et al, 2019;Shi et al, 2020;Wu B. et al, 2020;Xu et al, 2020), rule-based or slot-filling systems (Tammewar et al, 2018;Zhang Z. et al, 2019), a knowledge-base (Ganhotra and Polymenakos, 2018;Ghazvininejad et al, 2018;Luo et al, 2019;Yavuz et al, 2019;Moon et al, 2019;Wu et al, 2019;Lian et al, 2019;Zhang B. et al, 2020;Majumder et al, 2020;Tuan et al, 2021) or with automatic extraction of attributes from dialogue (Tigunova et al, 2019(Tigunova et al, , 2020Wu C.-S. et al, 2020Wu C.-S. et al, , 2021Ma et al, 2021) to improve the personalised entity selection in responses.…”

Section: Discussionmentioning

confidence: 99%

“…The results show that on average, Split Memory is the best performing model, however, End-to-End Memory Network (MemN2N) is the best model for task 8 (containing all tasks). Santoro et al, 2016;Vinyals et al, 2016;Munkhdalai and Yu, 2017;Madotto et al, 2019;Zhang W.-N. et al, 2019;Song et al, 2020;Tian et al, 2021) and key-value memory structures (Xu et al, 2017;Kaiser et al, 2017;Yang, 2018, 2020;de Masson d'Autume et al, 2019) could provide effective insights to alleviate data scarcity and enable quick adaption to various users through improving few-shot and lifelong learning capabilities of the dialogue models (Wang et al, 2020b). Another line of attack could be to learn from users during deployment (similar to Zhao and Eskenazi, 2016;Hancock et al, 2019;Sreedhar et al, 2020;Liu, 2020;Irfan et al, 2020b), where the feedback could be the user response or emotions of the user to evaluate the user satisfaction with the agent's responses.…”

Section: Discussionmentioning

confidence: 99%

See 2 more Smart Citations

Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions

Irfan

Hellou²,

Belpaeme

2021

Front. Robot. AI

View full text Add to dashboard Cite

While earlier research in human-robot interaction pre-dominantly uses rule-based architectures for natural language interaction, these approaches are not flexible enough for long-term interactions in the real world due to the large variation in user utterances. In contrast, data-driven approaches map the user input to the agent output directly, hence, provide more flexibility with these variations without requiring any set of rules. However, data-driven approaches are generally applied to single dialogue exchanges with a user and do not build up a memory over long-term conversation with different users, whereas long-term interactions require remembering users and their preferences incrementally and continuously and recalling previous interactions with users to adapt and personalise the interactions, known as the lifelong learning problem. In addition, it is desirable to learn user preferences from a few samples of interactions (i.e., few-shot learning). These are known to be challenging problems in machine learning, while they are trivial for rule-based approaches, creating a trade-off between flexibility and robustness. Correspondingly, in this work, we present the text-based Barista Datasets generated to evaluate the potential of data-driven approaches in generic and personalised long-term human-robot interactions with simulated real-world problems, such as recognition errors, incorrect recalls and changes to the user preferences. Based on these datasets, we explore the performance and the underlying inaccuracies of the state-of-the-art data-driven dialogue models that are strong baselines in other domains of personalisation in single interactions, namely Supervised Embeddings, Sequence-to-Sequence, End-to-End Memory Network, Key-Value Memory Network, and Generative Profile Memory Network. The experiments show that while data-driven approaches are suitable for generic task-oriented dialogue and real-time interactions, no model performs sufficiently well to be deployed in personalised long-term interactions in the real world, because of their inability to learn and use new identities, and their poor performance in recalling user-related data.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions

Irfan

Hellou²,

Belpaeme

2021

Front. Robot. AI

View full text Add to dashboard Cite

show abstract

“…Meta-learning has recently been explored in addressing the limited personalized data issue. CMAML (Song et al, 2020c) is a meta-learning based method that learns from few shot personas by customizing the model structures. Besides the meta-learning methods, GDR (Song et al, 2020a) introduces inference ability on the PersonaChat with a generate-refine framework.…”

Section: Compared Methodsmentioning

confidence: 99%

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

Song¹,

Wang²,

Zhang³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Maintaining consistent personas is essential for dialogue agents. Although tremendous advancements have been brought, the limitedscale of annotated persona-dense data are still barriers towards training robust and consistent persona-based dialogue models. In this work, we show how the challenges can be addressed by disentangling persona-based dialogue generation into two sub-tasks with a novel BERTover-BERT (BoB) model. Specifically, the model consists of a BERT-based encoder and two BERT-based decoders, where one decoder is for response generation, and another is for consistency understanding. In particular, to learn the ability of consistency understanding from large-scale non-dialogue inference data, we train the second decoder in an unlikelihood manner. Under different limited data settings, both automatic and human evaluations demonstrate that the proposed model outperforms strong baselines in response quality and persona consistency.

show abstract

“…This problem becomes even more severe in emerging research topics (Baig, 2020;Baines et al, 2020), such as COVID-19, where curated definitions could be imprecise and do not scale to rapidly proposed terminologies. Neural text generation (Bowman et al, 2016;Vaswani et al, 2017;Sutskever et al, 2014;Song et al, 2020b) could be a plausible solution to this problem by generating definition text based on the terminology text. Encouraging results by neural text generation have been observed on related tasks, such as paraphrase generation (Li et al, 2020), description generation (Cheng et al, 2020), synonym generation (Gupta et al, 2015) and data augmentation (Malandrakis et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Graphine: A Dataset for Graph-aware Terminology Definition Generation

Liu

Wang

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Precisely defining the terminology is the first step in scientific communication. Developing neural text generation models for definition generation can circumvent the laborintensity curation, further accelerating scientific discovery. Unfortunately, the lack of large-scale terminology definition dataset hinders the process toward definition generation. In this paper, we present a large-scale terminology definition dataset Graphine covering 2,010,648 terminology definition pairs, spanning 227 biomedical subdisciplines. Terminologies in each subdiscipline further form a directed acyclic graph, opening up new avenues for developing graph-aware text generation models. We then proposed a novel graphaware definition generation model Graphex that integrates transformer with graph neural network. Our model outperforms existing text generation models by exploiting the graph structure of terminologies. We further demonstrated how Graphine can be used to evaluate pretrained language models, compare graph representation learning methods and predict sentence granularity. We envision Graphine to be a unique resource for definition generation and many other NLP tasks in biomedicine. 1

show abstract

Learning to Customize Model Structures for Few-shot Dialogue Generation Tasks

Cited by 28 publications

References 24 publications

Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions

Coffee With a Hint of Data: Towards Using Data-Driven Approaches in Personalised Long-Term Interactions

BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data

Graphine: A Dataset for Graph-aware Terminology Definition Generation

Contact Info

Product

Resources

About