A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation

Nan, Feng; Yao, Jin-Ge; Wang, Jinpeng; Pan, Rong; Lin, Chin-Yew

doi:10.18653/v1/p19-1256

Cited by 60 publications

(64 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Contemporaneous with our work is the effort of Nie et al (2019), who focus on automatic data cleaning using a NLU iteratively bootstrapped from the noisy data. Their analysis similarly finds that omissions are more common than hallucinations.…”

Section: Discussion and Related Workmentioning

confidence: 99%

Semantic Noise Matters for Neural Natural Language Generation

Dušek¹,

Howcroft²,

Rieser³

2019

Proceedings of the 12th International Conference on Natural Language Generation

View full text Add to dashboard Cite

Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-theart NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.

show abstract

Section: Discussion and Related Workmentioning

confidence: 99%

Semantic Noise Matters for Neural Natural Language Generation

Dušek¹,

Howcroft²,

Rieser³

2019

Proceedings of the 12th International Conference on Natural Language Generation

View full text Add to dashboard Cite

show abstract

“…The former path is risky as it easily results in ungrammatical targets. The latter approach of enforcing a stronger alignment between inputs and outputs has been tried previously but it assumes a moderate amount of noise in the data (Nie et al, 2019;Dušek et al, 2019). Alternatively, one can leave the data as is and try to put more pressure on the decoder to pay attention to the input at every generation step (Tian et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data

Filippova¹

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Neural text generation (data-or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data, such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinategenerate fluent but unsupported text. Our contribution is a simple but powerful technique to treat such hallucinations as a controllable aspect of the generated text, without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both in an automatic and in a human evaluation.

show abstract

“…It is worth noting that SR can be regarded as a variant of self-training due to its structural similarity, except that it takes the target sentences rather than the source sentences as input to the model. The algorithm itself is the key difference from existing methods based on selftraining (Wang, 2019;Nie et al, 2019;Xie et al, 2020).…”

Section: Proposed Denoising Methodsmentioning

confidence: 99%

A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

Mita

Kiyono

Kaneko³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Existing approaches for grammatical error correction (GEC) largely rely on supervised learning with manually created GEC datasets. However, there has been little focus on verifying and ensuring the quality of the datasets, and on how lower-quality data might affect GEC performance. We indeed found that there is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected. To address this, we designed a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models, and outperformed strong denoising baseline methods. We further applied task-specific techniques and achieved state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks. We then analyzed the effect of the proposed denoising method, and found that our approach leads to improved coverage of corrections and facilitated fluency edits which are reflected in higher recall and overall performance.

show abstract

A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation

Cited by 60 publications

References 21 publications

Semantic Noise Matters for Neural Natural Language Generation

Semantic Noise Matters for Neural Natural Language Generation

Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data

A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

Contact Info

Product

Resources

About