Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2 2017
DOI: 10.18653/v1/e17-2046
|View full text |Cite
|
Sign up to set email alerts
|

Improving ROUGE for Timeline Summarization

Abstract: Current evaluation metrics for timeline summarization either ignore the temporal aspect of the task or require strict date matching. We introduce variants of ROUGE that allow alignment of daily summaries via temporal distance or semantic similarity. We argue for the suitability of these variants in a theoretical analysis and demonstrate it in a battery of task-specific tests.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
17
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 18 publications
(18 citation statements)
references
References 16 publications
0
17
0
Order By: Relevance
“…Finally, it is worth mentioning the work of (Martschat and Markert, 2017), where a variant of ROUGE that allows for evaluation of timeline summarization is presented. This novel metric takes into account both temporal and semantic similarity of daily summaries.…”
Section: Task Based Evaluationmentioning
confidence: 99%
“…Finally, it is worth mentioning the work of (Martschat and Markert, 2017), where a variant of ROUGE that allows for evaluation of timeline summarization is presented. This novel metric takes into account both temporal and semantic similarity of daily summaries.…”
Section: Task Based Evaluationmentioning
confidence: 99%
“…Therefore, researchers developed automated methods to evaluate summaries. Most of these methods are based on the similarity measure between a summary and its original text, but they do not relate the judgment with human judgment [28]. Hence, a recall-oriented method named ROUGE was developed to evaluate the quality of the summary [28].…”
Section: Evaluation Methodsmentioning
confidence: 99%
“…Most of these methods are based on the similarity measure between a summary and its original text, but they do not relate the judgment with human judgment [28]. Hence, a recall-oriented method named ROUGE was developed to evaluate the quality of the summary [28]. Here, the system summary and the reference summary (human-generated) are compared to evaluate the summary quality.…”
Section: Evaluation Methodsmentioning
confidence: 99%
“…Automatic evaluation of TLS is done by ROUGE (Lin, 2004). We report ROUGE-1 and ROUGE-2 F 1 scores for the concat, agreement and align+ m:1 metrics for TLS we presented in Martschat and Markert (2017). These metrics perform evaluation by concatenating all daily summaries, evaluating only matching days and evalu-ating aligned dates based on date and content similarity, respectively.…”
Section: Evaluation Metricsmentioning
confidence: 99%