Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.92
|View full text |Cite
|
Sign up to set email alerts
|

The Curious Case of Hallucinations in Neural Machine Translation

Abstract: In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldman (2020), and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural ha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
59
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 75 publications
(99 citation statements)
references
References 24 publications
0
59
0
Order By: Relevance
“…Hallucinations. To estimate the number of hallucinations produced by the systems evaluated, we follow the procedure proposed by and used by Raunak et al (2021). Although their interest was in detecting those sentences that induced the generation of hallucinations after introducing spurious tokens in the input, we adapted it to automatically measure the number of input sentences in a test set for which the corresponding output seems to be an hallucination.…”
Section: Explainabilitymentioning
confidence: 99%
“…Hallucinations. To estimate the number of hallucinations produced by the systems evaluated, we follow the procedure proposed by and used by Raunak et al (2021). Although their interest was in detecting those sentences that induced the generation of hallucinations after introducing spurious tokens in the input, we adapted it to automatically measure the number of input sentences in a test set for which the corresponding output seems to be an hallucination.…”
Section: Explainabilitymentioning
confidence: 99%
“…First, back-translation transformation remove content style but does not necessarily replace attribute markers like style transfer models, for example, given a text "me and my husband ...", style trans-fer models are more likely to change "husband" to "wife" but back-translation will not. Second, our back-translation technique also inherit some of the problems of machine translation generated texts like hallucination (Raunak et al, 2021). We provide examples highlighting these issues in Appendix C.…”
Section: Discussionmentioning
confidence: 99%
“…Lastly, unlike the aforementioned tasks, the categorizations of hallucinations in machine translation vary within the task. Most relevant literature agrees that translated text is considered a hallucination when the source text is completely disconnected from the translated target [91,125,145]. For further details, please refer to Section 11.…”
Section: Task Comparisonmentioning
confidence: 99%
“…They discovered that such likelihood maximization approaches could result in degeneration, which refers generated output that is bland, incoherent, or gets stuck in repetitive loops [71,185]. Concurrently, it is discovered that NLG models often generate texts that are nonsensical, or unfaithful to the provided source input [82,145,150,178]. Researchers started referring to such undesirable generation as hallucination [117] 1 .…”
Section: Introductionmentioning
confidence: 99%