Diversifying Dialog Generation via Adaptive Label Smoothing

Wang, Yida; Zheng, Yinhe; Jiang, Yong; Huang, Minlie

doi:10.18653/v1/2021.acl-long.272

Cited by 16 publications

(14 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• AdaLabel In the AdaLabel model [Wang et al, 2021], the authors applied an adaptive label smoothing to prevent the model from being overconfident over a single choice. The main idea of this paper is to use a soft-target distribution depending on the context, instead of usual one-hot distribution.…”

Section: Baselinesmentioning

confidence: 99%

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

Santra¹,

Ghadia²,

Dwivedi³

et al. 2022

Preprint

View full text Add to dashboard Cite

Natural Language Generation (NLG) represents a large collection of tasks in the field of NLP. While many of these tasks have been tackled well by the cross-entropy (CE) loss, the task of dialog generation poses a few unique challenges for this loss function. First, CE loss assumes that for any given input, the only possible output is the one available as the ground truth in the training dataset. In general, this is not true for any task, as there can be multiple semantically equivalent sentences, each with a different surface form. This problem gets exaggerated further for the dialog generation task, as there can be multiple valid responses (for a given context) that not only have different surface forms but are also not semantically equivalent. Second, CE loss does not take the context into consideration while processing the response and, hence, it treats all ground truths with equal importance irrespective of the context. But, we may want our final agent to avoid certain classes of responses (e.g. bland, non-informative or biased responses) and give relatively higher weightage for more context-specific responses. To circumvent these shortcomings of the CE loss, in this paper, we propose a novel loss function, CORAL, that directly optimizes recently proposed estimates of human preference for generated responses. Using CORAL, we can train dialog generation models without assuming non-existence of response other than the ground-truth. Also, the CORAL loss is computed based on both the context and the response. Our experiments show that, against various large and small scale baselines, our model is able to obtain significantly higher scores for the human preference estimate. While large-scale dialog models have shown much promise and aptitude for dialog generation, it is important to continue the search for more suitable loss functions for training dialog systems. Extensive comparisons on two benchmark datasets show that the proposed methods outperform strong state-of-the-art baseline models of different sizes. We will make the code and trained model checkpoints publicly available upon publication of this paper.Preprint. Under review.

show abstract

Section: Baselinesmentioning

confidence: 99%

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

Santra¹,

Ghadia²,

Dwivedi³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Though it is not fully clear how large their pre-training corpus is, it has been suggested that the GloVe embeddings that are trained on larger corpora than Wikipedia (Zheng et al, 2019a,b;Zhuang et al, 2018) is a better option. Moreover, as the pre-training based generative model is becoming the de facto standard for text generation tasks (Zheng et al, 2020b;Zhang et al, 2020;Zheng et al, 2021b;Wang et al, 2020Wang et al, , 2021Wu et al, 2021;He et al, 2021;Zheng et al, 2021a;Zhou et al, 2021;Liu et al, 2021;He et al, 2022), replacing the generator with a pre-trained GPT model (Radford et al, 2018) would be a promising direction to pursue.…”

Section: Hyper-parameters Mattersmentioning

confidence: 99%

Accuracy on In-Domain Samples Matters When Building Out-of-Domain detectors: A Reply to Marek et al. (2021)

Zheng¹,

Chen²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

We have noticed that Marek et al. (2021) try to re-implement our paper Zheng et al. (2020a) in their work "OodGAN: Generative Adversarial Network for Out-of-Domain Data Generation". Our paper proposes a model to generate pseudo OOD samples that are akin to IN-Domain (IND) input utterances. These pseudo OOD samples can be used to improve the OOD detection performance by optimizing an entropy regularization term when building the IND classifier. Marek et al. (2021) report a large gap between their re-implemented results and ours on the CLINC150 dataset (Larson et al., 2019). This paper discusses some key observations that may have led to such a large gap. Most of these observations originate from our experiments because Marek et al. (2021) have not released their codes . One of the most important observations is that stronger IND classifiers usually exhibit a more robust ability to detect OOD samples. We hope these observations help other researchers, including Marek et al. (2021), to develop better OOD detectors in their applications.

show abstract

“…Traditional dialogue systems [17,33] usually consist of three components: natural language understanding (NLU) [28,30,58,59], dialogue management (DM) [6,7,18], and natural language generation (NLG) [50,63,65,66] modules. Empirically, NLU plays the most important role in task-oriented dialogue systems, including tasks such as intent detection [12,13,29,57], slot filling [61], and semantic parsing [19?…”

Section: Related Workmentioning

confidence: 99%

Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue System

Lin,

Wu,

Huang

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we present Duplex Conversation, a multi-turn, multimodal spoken dialogue system that enables telephone-based agents to interact with customers like a human. We use the concept of full-duplex in telecommunication to demonstrate what a humanlike interactive experience should be and how to achieve smooth turn-taking through three subtasks: user state detection, backchannel selection, and barge-in detection. Besides, we propose semisupervised learning with multimodal data augmentation to leverage unlabeled data to increase model generalization. Experimental results on three sub-tasks show that the proposed method achieves consistent improvements compared with baselines. We deploy the Duplex Conversation to Alibaba intelligent customer service and share lessons learned in production. Online A/B experiments show that the proposed system can significantly reduce response latency by 50%. CCS CONCEPTS• Computing methodologies → Discourse, dialogue and pragmatics; • Information systems → Multimedia information systems.

show abstract

Diversifying Dialog Generation via Adaptive Label Smoothing

Cited by 16 publications

References 30 publications

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

CORAL: Contextual Response Retrievability Loss Function for Training Dialog Generation Models

Accuracy on In-Domain Samples Matters When Building Out-of-Domain detectors: A Reply to Marek et al. (2021)

Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue System

Contact Info

Product

Resources

About