Detecting Hallucinated Content in Conditional Neural Sequence Generation

Zhou, Chunting; Neubig, Graham; Gu, Jiatao; Diab, Mona; Guzmán, Francisco; Zettlemoyer, Luke; Ghazvininejad, Marjan

doi:10.18653/v1/2021.findings-acl.120

Cited by 67 publications

(62 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hallucination in text-generation models is a topic that has received attention recently, particularly in the settings of summarization (Maynez et al, 2020), machine translation (Zhou et al, 2021), and news generation (Zellers et al, 2019). For dialogue, it has been observed in state-of-the-art models (Roller et al, 2021) and studied in depth (Mielke et al, 2020), but so far without resolution.…”

Section: Related Workmentioning

confidence: 99%

Retrieval Augmentation Reduces Hallucination in Conversation

Shuster¹,

Poff²,

Chen³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

148

View full text Add to dashboard Cite

Despite showing increasingly human-like conversational abilities, state-of-the-art dialogue models often suffer from factual incorrectness and hallucination of knowledge (Roller et al., 2021).In this work we explore the use of neural-retrieval-in-the-loop architectures -recently shown to be effective in open-domain QA (Lewis et al., 2020b; Izacard and Grave, 2021b) -for knowledge-grounded dialogue, a task that is arguably more challenging as it requires querying based on complex multi-turn dialogue context and generating conversationally coherent responses. We study various types of architectures with multiple components -retrievers, rankers, and encoder-decoders -with the goal of maximizing knowledgeability while retaining conversational ability. We demonstrate that our best models obtain state-of-the-art performance on two knowledge-grounded conversational tasks.The models exhibit open-domain conversational capabilities, generalize effectively to scenarios not within the training data, and, as verified by human evaluations, substantially reduce the well-known problem of knowledge hallucination in state-of-the-art chatbots.

show abstract

Section: Related Workmentioning

confidence: 99%

Retrieval Augmentation Reduces Hallucination in Conversation

Shuster¹,

Poff²,

Chen³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

148

View full text Add to dashboard Cite

show abstract

“…There has been a lot of recent work in abstractive summarization showing that state-of-the-art systems suffer from generating inconsistent information with respect to the source article, despite their improved success in producing fluent summaries (Falke et al, 2019;Lux et al, 2020). Since word-overlap based metrics such as ROUGE have low correlation with human scores of faithfulness (Kryscinski et al, 2019;Fabbri et al, 2020), there has been significant effort to develop automated metrics that can detect such errors (Zhou et al, 2021;Gabriel et al, 2021;Pagnoni et al, 2021a). For example, Falke et al (2019), Maynez et al (2020) and Goyal and Durrett (2020) have proposed to assess faithfulness using entailment models, where a faithful summary should be assigned a high entailment score with respect to the original article.…”

Section: Related Workmentioning

confidence: 99%

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Ladhak¹,

Durmus²,

He³

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite recent progress in abstractive summarization, systems still suffer from faithfulness errors. While prior work has proposed models that improve faithfulness, it is unclear whether the improvement comes from an increased level of extractiveness of the model outputs as one naive way to improve faithfulness is to make summarization models more extractive. In this work, we present a framework for evaluating the effective faithfulness of summarization systems, by generating a faithfulnessabstractiveness trade-off curve that serves as a control at different operating points on the abstractiveness spectrum. We then show that the Maximum Likelihood Estimation (MLE) baseline as well as a recently proposed method for improving faithfulness, are both worse than the control at the same level of abstractiveness. Finally, we learn a selector to identify the most faithful and abstractive summary for a given document, and show that this system can attain higher faithfulness scores in human evaluations while being more abstractive than the baseline system on two datasets. Moreover, we show that our system is able to achieve a better faithfulness-abstractiveness trade-off than the control at the same level of abstractiveness.

show abstract

“…We note the relatively small training time of the mBART adaptation and the lack of Icelandic data in the pretraining task for mBART as primary factors that can be addressed for improving results. Additionally online (or semi-online) self-training instead of train-then-translate would also improve results, especially with selective loss truncation as described in (Zhou et al, 2021). The data selected for backtranslation should also be expanded for greater diversity of both genre and vocabulary.…”

Section: Future Workmentioning

confidence: 99%

Miðeind's WMT 2021 submission

Símonarson¹,

Snæbjarnarson²,

Ragnarsson³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present Miðeind's submission for the English→Icelandic and Icelandic→English subsets of the 2021 WMT news translation task. Transformer-base models are trained for translation on parallel data to generate backtranslations iteratively. A pretrained mBART-25 model is then adapted for translation using parallel data as well as the last backtranslation iteration. This adapted pretrained model is then used to re-generate backtranslations, and the training of the adapted model is continued.

show abstract

Detecting Hallucinated Content in Conditional Neural Sequence Generation

Cited by 67 publications

References 38 publications

Retrieval Augmentation Reduces Hallucination in Conversation

Retrieval Augmentation Reduces Hallucination in Conversation

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Miðeind's WMT 2021 submission

Contact Info

Product

Resources

About