Improving Factual Consistency of Abstractive Summarization via Question Answering

Feng, Nan; Santos, Cícero Nogueira dos; Zhu, Henghui; Ng, Patrick; McKeown, Kathleen; Nallapati, Ramesh; Zhang, Dejiao; Wang, Zhiguo; Arnold, Andrew O.; Xiang, Bing

doi:10.18653/v1/2021.acl-long.536

Cited by 44 publications

(59 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides, the input text of text summarization sometimes refers to the news including world facts. To make summarization models produce more factual summaries, some works proposed some evaluation metrics or correction methods to measure and revise the generated text for preserving factuality [35,133].…”

Section: Optimization Viewmentioning

confidence: 99%

Pretrained Language Models for Text Generation: A Survey

Li¹,

Tang²,

Zhao³

et al. 2022

Preprint

View full text Add to dashboard Cite

Text Generation aims to produce plausible and readable text in human language from input data. The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). Grounding text generation on PLMs is seen as a promising direction in both academia and industry. In this survey, we present the recent advances achieved in the topic of PLMs for text generation. In detail, we begin with introducing three key points of applying PLMs to text generation: 1) how to encode the input data as representations preserving input semantics which can be fused into PLMs; 2) how to design a universal and performant architecture of PLMs served as generation models; and 3) how to optimize PLMs given the reference text and ensure the generated text satisfying special text properties. Then, we figure out several challenges and future directions within each key point. Next, we present a summary of various useful resources and typical text generation applications to work with PLMs. Finally, we conclude and summarize the contribution of this survey.CCS Concepts: • Computing methodologies → Natural language generation.

show abstract

Section: Optimization Viewmentioning

confidence: 99%

Pretrained Language Models for Text Generation: A Survey

Li¹,

Tang²,

Zhao³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The additional procedure generated question-answer pairs from the source document and answered the questions from the generated text. In contrast to QuestEval, QUALS Nan et al (2021a) simplified the above procedure 1-3 by only one neural language model (QAGen). QUALS employs QAGen as proposed in (Shakeri et al, 2020), to generate both the questions and answers from the generated text.…”

Section: Answer Alignment Evaluationmentioning

confidence: 99%

Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

Li¹,

Wu²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Natural Language Generation (NLG) has made great progress in recent years due to the development of deep learning techniques such as pre-trained language models. This advancement has resulted in more fluent, coherent and even properties controllable (e.g. stylistic, sentiment, length etc.) generation, naturally leading to development in downstream tasks such as abstractive summarization, dialogue generation, machine translation, and data-to-text generation. However, the faithfulness problem that the generated text usually contains unfaithful or non-factual information has become the biggest challenge, which makes the performance of text generation unsatisfactory for practical applications in many real-world scenarios. Many studies on analysis, evaluation, and optimization methods for faithfulness problems have been proposed for various tasks, but have not been organized, compared and discussed in a combined manner. In this survey, we provide a systematic overview of the research progress on the faithfulness problem of NLG, including problem analysis, evaluation metrics and optimization methods. We organize the evaluation and optimization methods for different tasks into a unified taxonomy to facilitate comparison and learning across tasks. Several research trends are discussed further. NLG Fluency Grammatical Coherence InformativenessDiversity Specificity Redundnacy Controllability Stylistic Attributes Content Faithfulness Consistency Fidelity Factuality Figure 1: Four aspects of the NLG challenge. Faithfulness has become the biggest challenge in modern natural language generation.

show abstract

“…Much work in this area concerns improving factuality and factual consistency (Shuster et al, 2021;Nan et al, 2021;Mao et al, 2020;Aralikatte et al, 2021). While this is one aspect of our work, we also aim to improve automatic evaluation, for which a single standard metric has not emerged.…”

Section: Factuality and Factual Consistencymentioning

confidence: 99%

“…While this is one aspect of our work, we also aim to improve automatic evaluation, for which a single standard metric has not emerged. Some works evaluate factuality and consistency with extraction (Goodrich et al, 2019;Zhang et al, 2020) or question answering (Wang et al, 2020;Durmus et al, 2020;Nan et al, 2021). Others use notions of entailment (Falke et al, 2019), or simply train end-to-end models to judge these aspects directly (Kryscinski et al, 2020).…”

Section: Factuality and Factual Consistencymentioning

confidence: 99%

Probing Factually Grounded Content Transfer with Factual Ablation

West¹,

Quirk²,

Galley³

et al. 2022

Preprint

View full text Add to dashboard Cite

Despite recent success, large neural models often generate factually incorrect text. Compounding this is the lack of a standard automatic evaluation for factuality-it cannot be meaningfully improved if it cannot be measured. Grounded generation promises a path to solving both of these problems: models draw on a reliable external document (grounding) for factual information, simplifying the challenge of factuality. Measuring factuality is also simplified-to factual consistency, testing whether the generation agrees with the grounding, rather than all facts. Yet, without a standard automatic metric for factual consistency, factually grounded generation remains an open problem.We study this problem for content transfer, in which generations extend a prompt, using information from factual grounding. Particularly, this domain allows us to introduce the notion of factual ablation for automatically measuring factual consistency: this captures the intuition that the model should be less likely to produce an output given a less relevant grounding document. In practice, we measure this by presenting a model with two grounding documents, and the model should prefer to use the more factually relevant one. We contribute two evaluation sets to measure this. Applying our new evaluation, we propose multiple novel methods improving over strong baselines. Context Speedway management would maintain their pol-icy to not race on Sundays until 1974 Indianapolis 500-1974. After a heartbreaking loss in 1912 Indianapolis 500-1912, Ralph DePalma succeeds in victory for 1915. DePalma was accompanied by riding mechanic Louis Fontaine. GPT-3 → The 1915 race was the first to have a post-race distance of more than 500 miles GPT-2 tuned → Depalma died on march 31, 1915, at his home in south Pasadena, California, of cancer. Grounding SOUTH PASADENA, Calif., March 31 (AP)-Ralph DePalma, pioneer auto racer who probably won more races than any other driver in history, died today of cancer. His age was 72. GPT-2 LT → He was the first driver to win the World War I-era American championship. GPT-2 PMI-add → Depalma died of cancer at his home in south Pasadena, California, at the age of 72.

show abstract

Improving Factual Consistency of Abstractive Summarization via Question Answering

Cited by 44 publications

References 23 publications

Pretrained Language Models for Text Generation: A Survey

Pretrained Language Models for Text Generation: A Survey

Faithfulness in Natural Language Generation: A Systematic Survey of Analysis, Evaluation and Optimization Methods

Probing Factually Grounded Content Transfer with Factual Ablation

Contact Info

Product

Resources

About