Proceedings of the 20th International Conference on Computational Linguistics - COLING '04 2004
DOI: 10.3115/1220355.1220432
|View full text |Cite
|
Sign up to set email alerts
|

Corpus and evaluation measures for multiple document summarization with multiple sources

Abstract: In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2005
2005
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 15 publications
(18 citation statements)
references
References 15 publications
(15 reference statements)
0
18
0
Order By: Relevance
“…Sentences not containing such clauses are rejected. 1 The intuitive motivation in that the entity is related to part of the ngram via the adverbial particle.…”
Section: Alignment Anchorsmentioning
confidence: 99%
See 1 more Smart Citation
“…Sentences not containing such clauses are rejected. 1 The intuitive motivation in that the entity is related to part of the ngram via the adverbial particle.…”
Section: Alignment Anchorsmentioning
confidence: 99%
“…Many natural-language intensive applications make such decisions internally. In document summarization, the generated summaries have a higher quality if redundant information has been discarded by detecting text fragments with the same meaning [1]. In information extraction, extraction templates will not be filled consistently whenever there is a mismatch in the trigger word or the applicable extraction pattern [2].…”
Section: Introductionmentioning
confidence: 99%
“…We use the TSC-3 corpus (Hirao et al, 2004) for evaluation. It is an evaluation corpus for multidocument summarization and was used in Text Summarization Challenge 3 3 .…”
Section: Datamentioning
confidence: 99%
“…The automatic detection of paraphrases is important in document summarization, to improve the quality of the generated summaries [1]; information extraction, to alleviate the mismatch in the trigger word or the applicable extraction pattern [2]; and question answering, to prevent a relevant document passage from being discarded due to the inability to match a question phrase deemed as very important [3].…”
Section: Motivationmentioning
confidence: 99%