If beam search is the answer, what was the question?

Meister, Clara; Cotterell, Ryan; Vieira, Tim

doi:10.18653/v1/2020.emnlp-main.170

Cited by 73 publications

(74 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note the tendency of LMs to assign unreasonably high probabilities to segments has also attracted attention from the viewpoint of memorization capability of LMs (Carlini et al, 2020). In addition, the connection of the UID hypothesis to the modern NLP techniques has been recently explored (Meister et al, 2020;Wei et al, 2021). We further investigate our hypothesis in Section 5.…”

Section: Discussion: Uniform Information Densitymentioning

confidence: 99%

Lower Perplexity is Not Always Human-Like

Kuribayashi¹,

Oseki²,

Ito³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

In computational psycholinguistics, various language models have been evaluated against human reading behavior (e.g., eye movement) to build human-like computational models. However, most previous efforts have focused almost exclusively on English, despite the recent trend towards linguistic universal within the general community. In order to fill the gap, this paper investigates whether the established results in computational psycholinguistics can be generalized across languages. Specifically, we re-examine an established generalization -the lower perplexity a language model has, the more human-like the language model isin Japanese with typologically different structures from English. Our experiments demonstrate that this established generalization exhibits a surprising lack of universality; namely, lower perplexity is not always human-like. Moreover, this discrepancy between English and Japanese is further explored from the perspective of (non-)uniform information density. Overall, our results suggest that a crosslingual evaluation will be necessary to construct human-like computational models.

show abstract

Section: Discussion: Uniform Information Densitymentioning

confidence: 99%

Lower Perplexity is Not Always Human-Like

Kuribayashi¹,

Oseki²,

Ito³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…Specifically, the RL model improves BLEU scores on long sentences by 3+ BLEU points and BP on those sentences by about 9+ points. This shows that our model, via smart segmentation, suffers less because of premature truncation of long translations as compared to the baseline-a common problem (Meister et al, 2020;Koehn and Knowles, 2017). While segmentation of long sentences at appropriate punctuations helps performance, segmentation at all punctuations is expected to hurt performance as it is highly likely to produce extremely small segments which lose a lot of necessary source context when individually translated.…”

Section: Resultsmentioning

confidence: 96%

Better Chinese Sentence Segmentation with Reinforcement Learning

Srinivasan¹,

Dyer²

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

A long-standing challenge in Chinese-English machine translation is that sentence boundaries are ambiguous in Chinese orthography, but inferring good splits is necessary for obtaining high quality translations. To solve this, we use reinforcement learning to train a segmentation policy that splits Chinese texts into segments that can be independently translated so as to maximise the overall translation quality. We compare to a variety of segmentation strategies and find that our approach improves the baseline BLEU score on the WMT2020 Chinese-English news translation task by +0.3 BLEU overall and improves the score on input segments that contain more than 60 words by +3 BLEU.

show abstract

“…For tasks like MT, this is not the case: Eikema and Aziz (2020) pointed out that the argmax receives so little mass that it is almost arbitrary, so seeking it with MAP decoding (which beam search approximates) itself causes many deficiencies of decoding. On the other hand, Meister et al (2020a) showed that beam search has a helpful bias and introduced regularization penalties for MAP decoding that encode it explicitly. Entmax neither directly addresses the faults of MAP decoding nor compensates for the locality biases of beam search, instead shrinking the gap between beam search and exact decoding.…”

Section: Related Workmentioning

confidence: 99%

Smoothing and Shrinking the Sparse Seq2Seq Search Space

Peters¹,

Martins²

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences. While this setup has led to strong results in a variety of tasks, one unsatisfying aspect is its length bias: models give high scores to short, inadequate hypotheses and often make the empty string the argmax-the so-called cat got your tongue problem. Recently proposed entmax-based sparse sequence-to-sequence models present a possible solution, since they can shrink the search space by assigning zero probability to bad hypotheses, but their ability to handle word-level tasks with transformers has never been tested. In this work, we show that entmax-based models effectively solve the cat got your tongue problem, removing a major source of model error for neural machine translation. In addition, we generalize label smoothing, a critical regularization technique, to the broader family of Fenchel-Young losses, which includes both cross-entropy and the entmax losses. Our resulting label-smoothed entmax loss models set a new state of the art on multilingual grapheme-to-phoneme conversion and deliver improvements and better calibration properties on cross-lingual morphological inflection and machine translation for 7 language pairs.

show abstract

If beam search is the answer, what was the question?

Cited by 73 publications

References 35 publications

Lower Perplexity is Not Always Human-Like

Lower Perplexity is Not Always Human-Like

Better Chinese Sentence Segmentation with Reinforcement Learning

Smoothing and Shrinking the Sparse Seq2Seq Search Space

Contact Info

Product

Resources

About