Proceedings of the COLING/ACL on Main Conference Poster Sessions - 2006
DOI: 10.3115/1273073.1273093
|View full text |Cite
|
Sign up to set email alerts
|

Topic-focused multi-document summarization using an approximate oracle score

Abstract: We consider the problem of producing a multi-document summary given a collection of documents. Since most successful methods of multi-document summarization are still largely extractive, in this paper, we explore just how well an extractive method can perform. We introduce an "oracle" score, based on the probability distribution of unigrams in human summaries. We then demonstrate that with the oracle score, we can generate extracts which score, on average, better than the human summaries, when evaluated with R… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
51
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 69 publications
(51 citation statements)
references
References 7 publications
0
51
0
Order By: Relevance
“…Summarization systems that directly optimize for more topic signatures during content selection have fared very well in evaluations (Conroy et al, 2006). Hence the number of topic signatures from the input present in a summary might be a good indicator of summary content quality.…”
Section: Use Of Topic Words In the Summarymentioning
confidence: 99%
“…Summarization systems that directly optimize for more topic signatures during content selection have fared very well in evaluations (Conroy et al, 2006). Hence the number of topic signatures from the input present in a summary might be a good indicator of summary content quality.…”
Section: Use Of Topic Words In the Summarymentioning
confidence: 99%
“…The scored sentences (a ranked list) are passed to an antiredundancy component for summary sentence selection. TsSum [2] relies on the computation of topic words [10], which are words that occur more often in the input text than in a large background corpus. The log-likelihood ratio test is applied, with a threshold parameter used to determine topic words from non-topic words.…”
Section: Reproducible Experimental Setupmentioning
confidence: 99%
“…Number of topic signature words (Lin and Hovy, 2000;Conroy et al, 2006) and percentage of signature words in the vocabulary. Document similarity in the input set These features apply to multi-document summarization only.…”
Section: Log-likelihood Ratio For Words In the Inputmentioning
confidence: 99%