2022
DOI: 10.1609/aaai.v36i10.21299
|View full text |Cite
|
Sign up to set email alerts
|

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Abstract: Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 42 publications
0
2
0
Order By: Relevance
“…Hashimoto et al (2019) used the same foundation to combine human and automatic evaluation in capturing the trade-off between sampling diverse outputs and achieving the highest possible quality. Pillutla et al (2021) and Colombo et al (2022) expand on these insights and a framework by Djolonga et al (2020) to compare the human-and model-distributions by measuring the extent to which they diverge. A similar approach based on information theory estimates the extent to which a generated summary helps reconstruct the article on which the summary is based (Egan et al, 2022).…”
Section: The Status Quomentioning
confidence: 99%
“…Hashimoto et al (2019) used the same foundation to combine human and automatic evaluation in capturing the trade-off between sampling diverse outputs and achieving the highest possible quality. Pillutla et al (2021) and Colombo et al (2022) expand on these insights and a framework by Djolonga et al (2020) to compare the human-and model-distributions by measuring the extent to which they diverge. A similar approach based on information theory estimates the extent to which a generated summary helps reconstruct the article on which the summary is based (Egan et al, 2022).…”
Section: The Status Quomentioning
confidence: 99%
“…However, they can not compare two strings based on synonyms. InfoLM overcomes this drawback by using a pre-trained masked language model but does not require training to compute similarity scores between summaries and references over the vocabulary by discrete probability distributions [96].…”
Section: Summarization Evaluation Metricsmentioning
confidence: 99%
“…For future work, we plan to study OOD in sequence labelling tasks (Witon* et al, 2018;Colombo* et al, 2020;Chapuis* et al, 2020a;Colombo et al, 2021a), sequence generation (Colombo* et al, 2019;Jalalzai* et al, 2020;Modi et al, 2020;Colombo et al, 2021e) and fair classification (Colombo et al, 2021d;Pichler et al, 2022) and multimodal scenario (Garcia* et al, 2019;Dinkar* et al, 2020) as well as automatic evaluation (Colombo et al, 2021c;Colombo, 2021a;Staerman et al, 2021b).…”
Section: G Futures Applicationsmentioning
confidence: 99%