Proceedings of the 2017 ACM on Conference on Information and Knowledge Management 2017
DOI: 10.1145/3132847.3133000
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries

Abstract: ABSTRACTere is growing interest in systems that generate timeline summaries by ltering high-volume streams of documents to retain only those that are relevant to a particular event or topic. Continued advances in algorithms and techniques for this task depend on standardized and reproducible evaluation methodologies for comparing systems. However, timeline summary evaluation is still in its infancy, with competing methodologies currently being explored in international evaluation forums such as TREC. One area … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 23 publications
0
11
0
Order By: Relevance
“…Evaluation of summarization algorithms is traditionally done using ROUGE scores (based on unigram/bigram overlap with gold standard summaries), but these measures are not sufficient for timeline summarization methods. Nugget-based or cluster-based evaluation methods have recently been shown to be more effective, but they require lot of annotation effort (Baruah et al 2017).…”
Section: Rc3: Summarization Of Social Media Content Streamsmentioning
confidence: 99%
“…Evaluation of summarization algorithms is traditionally done using ROUGE scores (based on unigram/bigram overlap with gold standard summaries), but these measures are not sufficient for timeline summarization methods. Nugget-based or cluster-based evaluation methods have recently been shown to be more effective, but they require lot of annotation effort (Baruah et al 2017).…”
Section: Rc3: Summarization Of Social Media Content Streamsmentioning
confidence: 99%
“…of systems. As a result, it is unclear to what extent the test collections produced during these tracks can be used to evaluate the quality of new systems that were not pooled for judging [5].…”
Section: Timestampmentioning
confidence: 99%
“…Second as information clusters within the TREC Real-time Summarization track during 2016 and 2017. We choose to use the TREC Temporal Summarization implementation as the basis for the study in this paper as it is the more complex/costly to deploy of the two (due to the more fine-grained definition of atomic information units used) and because it enables a more detailed comparison of systems [5]. We discuss this implementaton below.…”
Section: Timeline Summaries and Evaluationmentioning
confidence: 99%
“…More precisely, events [1][2][3][4][5][6][7][8][9][10] 18 are assigned to a 'TTG-2013' label set, events 11-25 to a 'TTG-2014' label set and events 26-46 to a 'TTG-2015' label set. We use these label sets later to provide an approximate comparison of the performance of our proposed approaches to the TREC best participating systems for each year.…”
Section: 'Trects-201x' and 'Ttg-201x' Label Setsmentioning
confidence: 99%
“…In particular, the labeling methodologies used to create the nuggets and matches, the interfaces and support tools used to do the matching, as well as the assessor profiles differ between the TREC-TS original assessments ('TREC-TS-201X' label sets) and the label sets derived from 'TTG-All' ('TTG-201X' label sets). For those interested in examining the differences between these methodologies in more detail, we recommend reading the study by Baruah et al [4].…”
Section: 'Trects-201x' and 'Ttg-201x' Label Setsmentioning
confidence: 99%