Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.737
|View full text |Cite
|
Sign up to set email alerts
|

Fˆ2-Softmax: Diversifying Neural Text Generation via Frequency Factorized Softmax

Abstract: Despite recent advances in neural text generation, encoding the rich diversity in human language remains elusive. We argue that the sub-optimal text generation is mainly attributable to the imbalanced token distribution, which particularly misdirects the learning model when trained with the maximumlikelihood objective. As a simple yet effective remedy, we propose two novel methods, F 2 -Softmax and MefMax, for a balanced training even with the skewed frequency distribution. MefMax assigns tokens uniquely to fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 31 publications
0
8
0
Order By: Relevance
“…As a standard approach to training a neural text generation model, MLE has been proved to be defective. Choi et al (2020) have shown that MLE may mislead the model because of the imbalanced token distribution. Thus, they design a greedy approach MefMax and factorize Softmax to ensure a balanced training according to the word frequency.…”
Section: Training-based Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…As a standard approach to training a neural text generation model, MLE has been proved to be defective. Choi et al (2020) have shown that MLE may mislead the model because of the imbalanced token distribution. Thus, they design a greedy approach MefMax and factorize Softmax to ensure a balanced training according to the word frequency.…”
Section: Training-based Methodsmentioning
confidence: 99%
“…We performed experiments on the Wikitext-103 1 dataset (Merity et al, 2017), a large-scale benchmark containing more than 29 thousand Wikipedia articles with over 100 million words in total. Wikitext-103 has been widely used in many language modeling models (Welleck et al, 2020;Martins et al, 2020;Choi et al, 2020), but in order to train our POS guided Softmax, we need the corresponding POS tags. We use the Stanford CoreNLP's POS tagger 2 (Manning et al, 2014) to annotate words in Wikitext-103 with XPOS 3 tags (Hornby et al, 2017).…”
Section: Datasetmentioning
confidence: 99%
See 2 more Smart Citations
“…Thus, the models often suffer from generating diverse outputs. This has been addressed using different techniques, such as unlikelihood training (Welleck et al, 2020) and F 2 -Softmax (Choi et al, 2020). Clarification utility maximization (next subsection) also implicitly addresses this issue.…”
Section: Sequence-to-sequence Modelsmentioning
confidence: 99%