On NMT Search Errors and Model Errors: Cat Got Your Tongue?

Stahlberg, Felix; Byrne, Bill

doi:10.18653/v1/d19-1331

Cited by 100 publications

(153 citation statements)

References 15 publications

(21 reference statements)

Supporting

Mentioning

141

Contrasting

Order By: Relevance

“…This problem has also been reported in other conditional generation tasks(Sountsov and Sarawagi, 2016;Stahlberg and Byrne, 2019); we leave it for future work.…”

supporting

confidence: 55%

AmbigQA: Answering Ambiguous Open-domain Questions

Min¹,

Michael²,

Hajishirzi³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Ambiguity is inherent to open-domain question answering; especially when exploring new topics, it can be difficult to ask questions that have a single, unambiguous answer. In this paper, we introduce AMBIGQA, a new open-domain question answering task which involves finding every plausible answer, and then rewriting the question for each one to resolve the ambiguity. To study this task, we construct AMBIGNQ, a dataset covering 14,042 questions from NQ-OPEN, an existing opendomain QA benchmark. We find that over half of the questions in NQ-OPEN are ambiguous, with diverse sources of ambiguity such as event and entity references. We also present strong baseline models for AMBIGQA which we show benefit from weakly supervised learning that incorporates NQ-OPEN, strongly suggesting our new task and data will support significant future research effort. Our data and baselines are available at https://nlp.cs. washington.edu/ambigqa.

show abstract

“…This problem has also been reported in other conditional generation tasks(Sountsov and Sarawagi, 2016;Stahlberg and Byrne, 2019); we leave it for future work.…”

supporting

confidence: 55%

AmbigQA: Answering Ambiguous Open-domain Questions

Min¹,

Michael²,

Hajishirzi³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Neural sequence models trained with maximum likelihood estimation (MLE) have become a standard approach to modeling sequences in a variety of natural language applications such as machine translation (Bahdanau et al, 2015), dialogue modeling (Vinyals et al, 2015), and language modeling (Radford et al, 2019). Despite this success, MLEtrained neural sequence models have been shown to exhibit issues such as length bias (Sountsov and Sarawagi, 2016;Stahlberg and Byrne, 2019) and degenerate repetition (Holtzman et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

Welleck

Kulikov

Kim

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms -greedy search, beam search, top-k sampling, and nucleus sampling -are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a selfterminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.

show abstract

“…Koehn and Knowles (2017) raise 6 challenges for machine translation, including degrading performance for longer sentences, and degrading performance for larger beam sizes. Stahlberg and Byrne (2019) distinguish model errors (high probabilities of bad sequences) and search errors (failing to find sequences preferred by the model). They show that the global optimal translations (according to likelihood) are considerably worse than translations found by beam search.…”

Section: Related Workmentioning

confidence: 99%

Leveraging Sentence Similarity in Natural Language Generation: Improving Beam Search using Range Voting

Borgeaud

Emerson

2020

Proceedings of the Fourth Workshop on Neural Generation and Translation

View full text Add to dashboard Cite

We propose a method for natural language generation, choosing the most representative output rather than the most likely output. By viewing the language generation process from the voting theory perspective, we define representativeness using range voting and a similarity measure. The proposed method can be applied when generating from any probabilistic language model, including n-gram models and neural network models. We evaluate different similarity measures on an image captioning task and a machine translation task, and show that our method generates longer and more diverse sentences, providing a solution to the common problem of short outputs being preferred over longer and more informative ones. The generated sentences obtain higher BLEU scores, particularly when the beam size is large. We also perform a human evaluation on both tasks and find that the outputs generated using our method are rated higher.

show abstract

On NMT Search Errors and Model Errors: Cat Got Your Tongue?

Cited by 100 publications

References 15 publications

AmbigQA: Answering Ambiguous Open-domain Questions

AmbigQA: Answering Ambiguous Open-domain Questions

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

Leveraging Sentence Similarity in Natural Language Generation: Improving Beam Search using Range Voting

Contact Info

Product

Resources

About