Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.272
|View full text |Cite
|
Sign up to set email alerts
|

Diversifying Dialog Generation via Adaptive Label Smoothing

Abstract: Neural dialogue generation models trained with the one-hot target distribution suffer from the over-confidence issue, which leads to poor generation diversity as widely reported in the literature. Although existing approaches such as label smoothing can alleviate this issue, they fail to adapt to diverse dialog contexts. In this paper, we propose an Adaptive Label Smoothing (AdaLabel) approach that can adaptively estimate a target label distribution at each time step for different contexts. The maximum probabi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
2

Relationship

1
9

Authors

Journals

citations
Cited by 16 publications
(14 citation statements)
references
References 30 publications
0
14
0
Order By: Relevance
“…• AdaLabel In the AdaLabel model [Wang et al, 2021], the authors applied an adaptive label smoothing to prevent the model from being overconfident over a single choice. The main idea of this paper is to use a soft-target distribution depending on the context, instead of usual one-hot distribution.…”
Section: Baselinesmentioning
confidence: 99%
“…• AdaLabel In the AdaLabel model [Wang et al, 2021], the authors applied an adaptive label smoothing to prevent the model from being overconfident over a single choice. The main idea of this paper is to use a soft-target distribution depending on the context, instead of usual one-hot distribution.…”
Section: Baselinesmentioning
confidence: 99%
“…Though it is not fully clear how large their pre-training corpus is, it has been suggested that the GloVe embeddings that are trained on larger corpora than Wikipedia (Zheng et al, 2019a,b;Zhuang et al, 2018) is a better option. Moreover, as the pre-training based generative model is becoming the de facto standard for text generation tasks (Zheng et al, 2020b;Zhang et al, 2020;Zheng et al, 2021b;Wang et al, 2020Wang et al, , 2021Wu et al, 2021;He et al, 2021;Zheng et al, 2021a;Zhou et al, 2021;Liu et al, 2021;He et al, 2022), replacing the generator with a pre-trained GPT model (Radford et al, 2018) would be a promising direction to pursue.…”
Section: Hyper-parameters Mattersmentioning
confidence: 99%
“…Traditional dialogue systems [17,33] usually consist of three components: natural language understanding (NLU) [28,30,58,59], dialogue management (DM) [6,7,18], and natural language generation (NLG) [50,63,65,66] modules. Empirically, NLU plays the most important role in task-oriented dialogue systems, including tasks such as intent detection [12,13,29,57], slot filling [61], and semantic parsing [19?…”
Section: Related Workmentioning
confidence: 99%