Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection

Chen, Sihao; Zhang, Fan; Sone, Kazoo; Roth, Dan

doi:10.18653/v1/2021.naacl-main.475

Cited by 38 publications

(37 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multiple terminologies, such as faithfulness [20,22,50,117,133,144,144,163,172,195,219], factual consistency [18,19,24,154,157,194], fidelity [23], factualness 4 [146], factuality 4 [33],…”

Section: Human Evaluationmentioning

confidence: 99%

“…or on the other hand, hallucination [40,73,107,154,158], fact contradicting [129] are used in the human evaluation of hallucination to rate whether the generated text is in accord with the source input. Chen et al [22], Nie et al [130] use finer-grained metrics for intrinsic hallucination and extrinsic hallucination separately. Moreover, there are some broad metrics, such as Correctness [7,12,98,182], Accuracy [97,203], and Informativeness [102] considering both missing and additional contents (extrinsic hallucinations) compared to the input source.…”

Section: Human Evaluationmentioning

confidence: 99%

“…Consequently, a better semantic understanding helps alleviate the divergence issue from the source. For example, augmented with entity information [107], extracted relation triples from source document [20,73] obtained by Fact Description Extraction, synthetic data generated through replacement or perturbation [22,91], retrieved external knowledge [12,45,65,158,222], and retrieved similar training samples [13].…”

Section: Information Augmentationmentioning

confidence: 99%

See 2 more Smart Citations

Survey of Hallucination in Natural Language Generation

Ji¹,

Lee²,

Frieske³

et al. 2022

Preprint

View full text Add to dashboard Cite

Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent natural language generation, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also apparent that deep learning based generation is prone to hallucinate unintended texts, which degrades the system performance and fail to meet user expectations in many real-world scenarios. In order to address this issue, there have been studies in measuring and mitigating hallucinated texts. However there has not been a comprehensive review of the state-of-the-art in hallucination detection and mitigation.In this survey, we provide a broad overview of the research progress and challenges in the hallucination problem of NLG. The survey is organized into two parts: (1) a general overview of metrics, mitigation methods, and future directions; (2) an overview of task-specific research progress for hallucinations in a large set of downstream tasks, namely abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.

show abstract

“…Multiple terminologies, such as faithfulness [20,22,50,117,133,144,144,163,172,195,219], factual consistency [18,19,24,154,157,194], fidelity [23], factualness 4 [146], factuality 4 [33],…”

Section: Human Evaluationmentioning

confidence: 99%

Section: Human Evaluationmentioning

confidence: 99%

Section: Information Augmentationmentioning

confidence: 99%

See 1 more Smart Citation

Survey of Hallucination in Natural Language Generation

Ji¹,

Lee²,

Frieske³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Improving faithfulness of summarization systems is essential for deploying these systems in realworld scenarios, as such recent work has studied methods to improve the faithfulness of abstractive summarization systems (Zhao et al, 2020;Dong et al, 2020;Goyal and Durrett, 2021;Xu et al, 2020;Chen et al, 2021;Zhu et al, 2021). For example, Goyal and Durrett (2021) train summarization systems by modifying the training objective to maximize the likelihood of the subset of summary tokens that are considered faithful according to their factuality detection model.…”

Section: Related Workmentioning

confidence: 99%

“…Order determined by coin flip. as well as methods to improve faithfulness of generated summaries (Kang and Hashimoto, 2020;Chen et al, 2021). Intuitively, one straightforward way of improving faithfulness of generated summaries is to copy a larger amount of content from the source article (i.e.…”

Section: Introductionmentioning

confidence: 99%

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Ladhak¹,

Durmus²,

He³

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite recent progress in abstractive summarization, systems still suffer from faithfulness errors. While prior work has proposed models that improve faithfulness, it is unclear whether the improvement comes from an increased level of extractiveness of the model outputs as one naive way to improve faithfulness is to make summarization models more extractive. In this work, we present a framework for evaluating the effective faithfulness of summarization systems, by generating a faithfulnessabstractiveness trade-off curve that serves as a control at different operating points on the abstractiveness spectrum. We then show that the Maximum Likelihood Estimation (MLE) baseline as well as a recently proposed method for improving faithfulness, are both worse than the control at the same level of abstractiveness. Finally, we learn a selector to identify the most faithful and abstractive summary for a given document, and show that this system can attain higher faithfulness scores in human evaluations while being more abstractive than the baseline system on two datasets. Moreover, we show that our system is able to achieve a better faithfulness-abstractiveness trade-off than the control at the same level of abstractiveness.

show abstract

Factual Error Correction in Summarization with Retriever-Reader Pipeline

Liu²,

Gao³

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Improving Faithfulness in Abstractive Summarization with Contrast Candidate Generation and Selection

Cited by 38 publications

References 22 publications

Survey of Hallucination in Natural Language Generation

Survey of Hallucination in Natural Language Generation

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Factual Error Correction in Summarization with Retriever-Reader Pipeline

Contact Info

Product

Resources

About