Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers

Xu, Shusheng; Zhang, Xingxing; Wu, Yi; Wei, Furu; Zhou, Ming

doi:10.18653/v1/2020.findings-emnlp.161

Cited by 28 publications

(18 citation statements)

References 38 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After we re-implement the trigram blocking trick (i.e., removing sentences with repeating trigrams to existing summary sentences) which STAS used (Xu et al, 2020), FAR can achieve a better ROUGE-1 score 40.93/17.80/37.00 than STAS on CNN/DM. 3 reports the results on long document summarization (LDS) datasets arXiv, PubMed and BillSum.…”

Section: Results On Sdsmentioning

confidence: 99%

“…(Dong et al, 2020) point out that PACSUM has position bias, which makes PACSUM not suitable for long document summarization, and proposed hierarchical position-based model HipoRankfor scientific document summarization. STAS (Xu et al, 2020) design two summarization tasks related pretraining tasks to improve sentence representation. Then they proposed a rank method which combines attention weight with reconstruction loss to measure the centrality of sentences.…”

Section: Related Workmentioning

confidence: 99%

“…"..." refers to the omissions of context sentences due to space limitation. extractive summarization (Radev et al, 2000;Mihalcea and Tarau, 2004;Erkan and Radev, 2004;Schluter and Søgaard, 2015;Tixier et al, 2017;Zheng and Lapata, 2019;Xu et al, 2020;Dong et al, 2020). Compare with supervised ones, unsupervised methods 1).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Improving Unsupervised Extractive Summarization with Facet-Aware Modeling

Liang¹,

Wu²,

Li³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Unsupervised extractive summarization aims to extract salient sentences from documents without labeled corpus. Existing methods are mostly graph-based by computing sentence centrality. These methods usually tend to select sentences within the same facet, however, which often leads to the facet bias problem especially when the document has multiple facets (i.e. long-document and multidocuments). To address this problem, we proposed a novel facet-aware centrality-based ranking model. We let the model pay more attention to different facets by introducing a sentence-document weight. The weight is added to the sentence centrality score. We evaluate our method on a wide range of summarization tasks that include 8 representative benchmark datasets. Experimental results show that our method consistently outperforms strong baselines especially in longand multi-document scenarios and even performs comparably to some supervised models. Extensive analyses confirm that the performance gains come from alleviating the facet bias problem.

show abstract

Section: Results On Sdsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Improving Unsupervised Extractive Summarization with Facet-Aware Modeling

Liang¹,

Wu²,

Li³

2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…Considering this challenge, Zhang et al [2019b] and Xu et al [2020b] proposed hierarchical BERT to learn interactions between sentences with self-attention for document encoding. Besides, for capturing inter-sentential relations, DiscoBERT [Xu et al, 2020a] stacked graph convolutional network (GCN) on top of BERT to model structural discourse graphs. By directly operating on the discourse units, Dis-coBERT retains capacities to include more concepts or contexts, leading to more concise and informative output text.…”

Section: Unstructured Inputmentioning

confidence: 99%

Pretrained Language Model for Text Generation: A Survey

Tang

Zhao

et al. 2021

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

Text generation has become one of the most important yet challenging tasks in natural language processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). In this paper, we present an overview of the major advances achieved in the topic of PLMs for text generation. As the preliminaries, we present the general task definition and briefly describe the mainstream architectures of PLMs for text generation. As the core content, we discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text. We further summarize several important fine-tuning strategies for text generation. Finally, we present several future directions and conclude this paper. Our survey aims to provide text generation researchers a synthesis and pointer to related research.

show abstract

“…Closer to our task, there is the work in [45], where HIBERT, a hierarchical transformer (again, based on BERT) was first pre-trained in an unsupervised fashion and then fine-tuned on a supervised extractive summarization task, where all the sentences of each document are labeled as belonging or not to the summary of that document. Following this work, in [46] proposed to pre-train a hierarchical transformer model with a masked sentence prediction (in which the model is required to predict a masked sentence) and a sentence shuffling tasks (in which the model is required to predict the original order of the shuffled sentences). Then, also using the self-attention weights matrix (obtained by averaging over the heads for each layer and then averaging over the layers), the hierarchical pre-trained encoder is used to compute a ranking score for the sentences.…”

Section: Hierarchy In Transformer Modelsmentioning

confidence: 99%

Explainable Sentiment Analysis: A Hierarchical Transformer-Based Extractive Summarization Approach

et al. 2021

View full text Add to dashboard Cite

In recent years, the explainable artificial intelligence (XAI) paradigm is gaining wide research interest. The natural language processing (NLP) community is also approaching the shift of paradigm: building a suite of models that provide an explanation of the decision on some main task, without affecting the performances. It is not an easy job for sure, especially when very poorly interpretable models are involved, like the almost ubiquitous (at least in the NLP literature of the last years) transformers. Here, we propose two different transformer-based methodologies exploiting the inner hierarchy of the documents to perform a sentiment analysis task while extracting the most important (with regards to the model decision) sentences to build a summary as the explanation of the output. For the first architecture, we placed two transformers in cascade and leveraged the attention weights of the second one to build the summary. For the other architecture, we employed a single transformer to classify the single sentences in the document and then combine the probability scores of each to perform the classification and then build the summary. We compared the two methodologies by using the IMDB dataset, both in terms of classification and explainability performances. To assess the explainability part, we propose two kinds of metrics, based on benchmarking the models’ summaries with human annotations. We recruited four independent operators to annotate few documents retrieved from the original dataset. Furthermore, we conducted an ablation study to highlight how implementing some strategies leads to important improvements on the explainability performance of the cascade transformers model.

show abstract

Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers

Cited by 28 publications

References 38 publications

Improving Unsupervised Extractive Summarization with Facet-Aware Modeling

Improving Unsupervised Extractive Summarization with Facet-Aware Modeling

Pretrained Language Model for Text Generation: A Survey

Explainable Sentiment Analysis: A Hierarchical Transformer-Based Extractive Summarization Approach

Contact Info

Product

Resources

About