Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.415
|View full text |Cite
|
Sign up to set email alerts
|

CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning

Abstract: Factual inconsistencies in generated summaries severely limit the practical applications of abstractive dialogue summarization. Although significant progress has been achieved by using pre-trained neural language models, substantial amounts of hallucinated content are found during the human evaluation. In this work, we first devised a typology of factual errors to better understand the types of hallucinations generated by current models and conducted human evaluation on popular dialog summarization dataset. We… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(22 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…Factual consistency. As mentioned in Section 2, information inconsistency (Kryscinski et al, 2019(Kryscinski et al, , 2020 is a common problem of general text summarization systems, especially for the meeting domain (Tang et al, 2022). This suggests future research to focus on dealing with hallucinated content in generated meeting summaries.…”
Section: Future Directionsmentioning
confidence: 96%
See 1 more Smart Citation
“…Factual consistency. As mentioned in Section 2, information inconsistency (Kryscinski et al, 2019(Kryscinski et al, , 2020 is a common problem of general text summarization systems, especially for the meeting domain (Tang et al, 2022). This suggests future research to focus on dealing with hallucinated content in generated meeting summaries.…”
Section: Future Directionsmentioning
confidence: 96%
“…tween a summary and its source (Huang et al, 2021). It is reported that nearly 30% of summaries generated by neural seq2seq models suffer from fact fabrication (Cao et al, 2018), and for the dialogue domain, most of the factual errors are related to dialogue flow modeling, informal interactions between speakers, and complex coreference resolution (Tang et al, 2022). Given the special characteristics of dialogues, more studies will be needed to develop more appropriate metrics for dialogue summarization (Zechner and Waibel, 2000).…”
Section: [Problems]mentioning
confidence: 99%
“…Contrastive learning for faithfulness has been applied to fine-tuning (Nan et al, 2021b;Tang et al, 2022;Cao and Wang, 2021a), post-hoc editing (Cao et al, 2020;Zhu et al, 2021), re-ranking (Chen et al, 2021), and evaluation (Kryscinski et al, 2020;Deng et al, 2021a). This line of research has largely focused on the methods used to generate synthetic errors for negative contrast sets: i.e., by directly mimicking errors observed during human evaluation (Tang et al, 2022), entity swapping (Cao and Wang, 2021a), language model infilling (Cao and Wang, 2021a), or using unfaithful system outputs (Nan et al, 2021b). Orthogonal to our work, Cao and Wang (2021a) assess the relative efficacy of a diverse set of corruption methods when used for contrastive fine-tuning for faithfulness.…”
Section: Related Workmentioning
confidence: 99%
“…The SAMSum corpus is a largescale dialogue summarization dataset that contains 16k English daily conversations with corresponding summaries written by linguists. We use the human annotation of SAMSum summaries in ConFiT (Tang et al, 2022) as our meta-evaluation dataset, where they generate summaries from six summarization models and collected faithfulness score on a scale of 1-10. We refer to this dataset as MetaSAMSum.…”
Section: Metrics and Datamentioning
confidence: 99%
“…Kryscinski et al (2020) found that up to 30% of generated summaries are affected by factual inconsistencies. Tang et al (2022) studied types of factual errors generated by current models on popular dialogue summarization dataset and revealed hallucination issues. Thus having metrics that can reliably identify hallucinations and sourcecontradicting information becomes a critical step in summarization research.…”
Section: Introductionmentioning
confidence: 99%