Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.161
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers

Abstract: Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training. Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities. In this work, we find that transformer attentions can be used to rank sentences for unsupervised extractive summarization. Specifically, we first pre-train a hierarchical transformer model using unlabeled documents only. Then we propose a method to ran… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(18 citation statements)
references
References 38 publications
(66 reference statements)
0
15
0
Order By: Relevance
“…After we re-implement the trigram blocking trick (i.e., removing sentences with repeating trigrams to existing summary sentences) which STAS used (Xu et al, 2020), FAR can achieve a better ROUGE-1 score 40.93/17.80/37.00 than STAS on CNN/DM. 3 reports the results on long document summarization (LDS) datasets arXiv, PubMed and BillSum.…”
Section: Results On Sdsmentioning
confidence: 99%
See 2 more Smart Citations
“…After we re-implement the trigram blocking trick (i.e., removing sentences with repeating trigrams to existing summary sentences) which STAS used (Xu et al, 2020), FAR can achieve a better ROUGE-1 score 40.93/17.80/37.00 than STAS on CNN/DM. 3 reports the results on long document summarization (LDS) datasets arXiv, PubMed and BillSum.…”
Section: Results On Sdsmentioning
confidence: 99%
“…(Dong et al, 2020) point out that PACSUM has position bias, which makes PACSUM not suitable for long document summarization, and proposed hierarchical position-based model HipoRankfor scientific document summarization. STAS (Xu et al, 2020) design two summarization tasks related pretraining tasks to improve sentence representation. Then they proposed a rank method which combines attention weight with reconstruction loss to measure the centrality of sentences.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Considering this challenge, Zhang et al [2019b] and Xu et al [2020b] proposed hierarchical BERT to learn interactions between sentences with self-attention for document encoding. Besides, for capturing inter-sentential relations, DiscoBERT [Xu et al, 2020a] stacked graph convolutional network (GCN) on top of BERT to model structural discourse graphs. By directly operating on the discourse units, Dis-coBERT retains capacities to include more concepts or contexts, leading to more concise and informative output text.…”
Section: Unstructured Inputmentioning
confidence: 99%
“…Closer to our task, there is the work in [45], where HIBERT, a hierarchical transformer (again, based on BERT) was first pre-trained in an unsupervised fashion and then fine-tuned on a supervised extractive summarization task, where all the sentences of each document are labeled as belonging or not to the summary of that document. Following this work, in [46] proposed to pre-train a hierarchical transformer model with a masked sentence prediction (in which the model is required to predict a masked sentence) and a sentence shuffling tasks (in which the model is required to predict the original order of the shuffled sentences). Then, also using the self-attention weights matrix (obtained by averaging over the heads for each layer and then averaging over the layers), the hierarchical pre-trained encoder is used to compute a ranking score for the sentences.…”
Section: Hierarchy In Transformer Modelsmentioning
confidence: 99%