Corpus and evaluation measures for multiple document summarization with multiple sources

Hirao, Tsutomu; Fukusima, Takahiro; Okumura, Manabu; Nobata, Chikashi; Nanba, Hidetsugu

doi:10.3115/1220355.1220432

Cited by 15 publications

(18 citation statements)

References 15 publications

(15 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sentences not containing such clauses are rejected. 1 The intuitive motivation in that the entity is related to part of the ngram via the adverbial particle.…”

Section: Alignment Anchorsmentioning

confidence: 99%

“…Many natural-language intensive applications make such decisions internally. In document summarization, the generated summaries have a higher quality if redundant information has been discarded by detecting text fragments with the same meaning [1]. In information extraction, extraction templates will not be filled consistently whenever there is a mismatch in the trigger word or the applicable extraction pattern [2].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

Paşca

Dienes

2005

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This paper presents a lightweight method for unsupervised extraction of paraphrases from arbitrary textual Web documents. The method differs from previous approaches to paraphrase acquisition in that 1) it removes the assumptions on the quality of the input data, by using inherently noisy, unreliable Web documents rather than clean, trustworthy, properly formatted documents; and 2) it does not require any explicit clue indicating which documents are likely to encode parallel paraphrases, as they report on the same events or describe the same stories. Large sets of paraphrases are collected through exhaustive pairwise alignment of small needles, i.e., sentence fragments, across a haystack of Web document sentences. The paper describes experiments on a set of about one billion Web documents, and evaluates the extracted paraphrases in a natural-language Web search application.

show abstract

“…Sentences not containing such clauses are rejected. 1 The intuitive motivation in that the entity is related to part of the ngram via the adverbial particle.…”

Section: Alignment Anchorsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

Paşca

Dienes

2005

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…We use the TSC-3 corpus (Hirao et al, 2004) for evaluation. It is an evaluation corpus for multidocument summarization and was used in Text Summarization Challenge 3 3 .…”

Section: Datamentioning

confidence: 99%

Multi-Document Summarization Model Based on Redundancy-Constrained Knapsack Problem

Nishikawa¹,

Hirao²,

Makino³

et al. 2013

Journal of Natural Language Processing

Self Cite

View full text Add to dashboard Cite

In this study, we regard multi-document summarization as a redundancy-constrained knapsack problem. The summarization model based on this formulation is obtained by adding a constraint that curbs redundancy in the summary to a summarization model based on the Knapsack problem. As the redundancy-constrained knapsack problem is an NP-hard problem and its computational cost is high, we propose a fast decoding method based on the Lagrange heuristic to quickly locate an approximate solution. Experiments based on ROUGE evaluation show that our proposed model outperforms the state-of-the-art text summarization model, the maximum coverage model, in finding the optimal solution. We also show that our decoding method finds a good approximate solution, which is comparable to the optimal solution of the maximum coverage model, more than 100 times faster than an integer linear programming solver.

show abstract

“…The automatic detection of paraphrases is important in document summarization, to improve the quality of the generated summaries [1]; information extraction, to alleviate the mismatch in the trigger word or the applicable extraction pattern [2]; and question answering, to prevent a relevant document passage from being discarded due to the inability to match a question phrase deemed as very important [3].…”

Section: Motivationmentioning

confidence: 99%

Mining Paraphrases from Self-anchored Web Sentence Fragments

Paşca

2005

Knowledge Discovery in Databases: PKDD 2005

View full text Add to dashboard Cite

Abstract. Near-synonyms or paraphrases are beneficial in a variety of natural language and information retrieval applications, but so far their acquisition has been confined to clean, trustworthy collections of documents with explicit external attributes. When such attributes are available, such as similar time stamps associated to a pair of news articles, previous approaches rely on them as signals of potentially high content overlap between the articles, often embodied in sentences that are only slight, paraphrase-based variations of each other. This paper introduces a new unsupervised method for extracting paraphrases from an information source of completely different nature and scale, namely unstructured text across arbitrary Web textual documents. In this case, no useful external attributes are consistently available for all documents. Instead, the paper introduces linguistically-motivated text anchors, which are identified automatically within the documents. The anchors are instrumental in the derivation of paraphrases through lightweight pairwise alignment of Web sentence fragments. A large set of categorized names, acquired separately from Web documents, serves as a filtering mechanism for improving the quality of the paraphrases. A set of paraphrases extracted from about a billion Web documents is evaluated both manually and through its impact on a natural-language Web search application.

show abstract

Corpus and evaluation measures for multiple document summarization with multiple sources

Cited by 15 publications

References 15 publications

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

Multi-Document Summarization Model Based on Redundancy-Constrained Knapsack Problem

Mining Paraphrases from Self-anchored Web Sentence Fragments

Contact Info

Product

Resources

About