Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.448
|View full text |Cite
|
Sign up to set email alerts
|

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

Abstract: Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 29 publications
(27 citation statements)
references
References 18 publications
(40 reference statements)
1
17
0
Order By: Relevance
“…Overall, this fine-tuning strategy is able to generate explanations that follow a style similar to the reference explanation. However, we identify cases where the model generates gibberish and/or repetitive text, which are problems previously reported in the literature while using GPT-2 (Holtzman et al, 2019;Welleck et al, 2020). To address these issues, we devise a strategy to remove unimportant sentences that could introduce noise to the generation process.…”
Section: Abstractive: Gpt-2 Basedmentioning
confidence: 88%
“…Overall, this fine-tuning strategy is able to generate explanations that follow a style similar to the reference explanation. However, we identify cases where the model generates gibberish and/or repetitive text, which are problems previously reported in the literature while using GPT-2 (Holtzman et al, 2019;Welleck et al, 2020). To address these issues, we devise a strategy to remove unimportant sentences that could introduce noise to the generation process.…”
Section: Abstractive: Gpt-2 Basedmentioning
confidence: 88%
“…In this case, the factors ( |x) defined by the autoregressive model are not actually the conditional probabilities of the weighted language (as defined by §2.1). It is true that training with a likelihood objective does encourage finding a weighted language whose generative process always terminates (hence = 1), since this is the behavior observed in the training corpus (Chi and Geman, 1998;Chen et al, 2018;Welleck et al, 2020). Our definitions of ELN(CP) models require the actual conditional probabilities to be efficiently computable.…”
Section: Eln and Elncp Modelsmentioning
confidence: 99%
“…Concretely, we consider two decoding approaches: a deterministic decoding algorithm that produces a set of sequences using beam search with beam-width k, and a stochastic decoding algorithm that forms a set of sequences using ancestral sampling until k unique sequences are obtained. 1 We refer readers to Welleck et al (2020a) for detailed descriptions of those decoding algorithms.…”
Section: Neural Autoregressive Sequence Modelingmentioning
confidence: 99%
“…However, recent studies suggest that the most likely sequences may not resemble training sequences at all. For instance, the learning stage can yield a distribution p model which places high probability on empty (Stahlberg and Byrne, 2019) or repetitive (Holtzman et al, 2019) sequences, while the decoding stage can yield a distribution p F which places non-zero mass on infinite-length sequences (Welleck et al, 2020a).…”
Section: Mode Recoverymentioning
confidence: 99%