2020
DOI: 10.48550/arxiv.2005.10070
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

Abstract: Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries and has important applications in story clustering for newsfeeds, presentation of search results, and timeline generation. However, there is a lack of datasets that realistically address such use cases at a scale large enough for training supervised models for this task. This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 3 publications
0
2
0
Order By: Relevance
“…Supervised methods in abstractive summarization always use the encoder-decoder transformer architecture with data sets of large, paired document-summary examples. Ghalandari et al (2020) propose an end-to-end Hierarchical MMR-Attention Pointergenerator (Hi-MAP) model to address the information redundancy. Li et al (2020) develop a neural abstractive MDS model which can leverage similarity graph or discourse graph representations of documents, to more effectively capture cross-document relations.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Supervised methods in abstractive summarization always use the encoder-decoder transformer architecture with data sets of large, paired document-summary examples. Ghalandari et al (2020) propose an end-to-end Hierarchical MMR-Attention Pointergenerator (Hi-MAP) model to address the information redundancy. Li et al (2020) develop a neural abstractive MDS model which can leverage similarity graph or discourse graph representations of documents, to more effectively capture cross-document relations.…”
Section: Related Workmentioning
confidence: 99%
“…WCEP (Ghalandari et al, 2020). WCEP data set contains human-written summaries of recent news events.…”
Section: Large Language Modelsmentioning
confidence: 99%