A Discourse-Aware Attention Model for Abstractive Summarization of
            Long Documents

Cohan, Arman; Dernoncourt, Franck; Kim, Doo Soon; Bui, Trung; Kim, Seokhwan; Chang, Walter; Goharian, Nazli

doi:10.18653/v1/n18-2097

Cited by 450 publications

(533 citation statements)

References 29 publications

Supporting

Mentioning

470

Contrasting

Unclassified

Order By: Relevance

“…Earlier on, it was realized that summarizing scientific papers requires different approaches than what was used for summarizing news articles, due to differences in document length, writing style and rhetorical structure. For instance, (Teufel and Moens, 2002) presented a supervised Naive Bayes (Cohan et al, 2018), the length is in terms of the number of words classifier to select content from a scientific paper based on the rhetorical status of each sentence (e.g., whether it specified a research goal, or some generally accepted scientific background knowledge, etc.). More recently, researchers have extended this work by applying more sophisticated classifiers to identify more fine-grain rhetorical categories, as well as by exploiting citation contexts.…”

Section: Extractive Summarization On Scientific Papersmentioning

confidence: 99%

Extractive Summarization of Long Documents by Combining Global and Local Context

Xiao¹,

Carenini²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

130

View full text Add to dashboard Cite

In this paper, we propose a novel neural singledocument extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic. We evaluate the model on two datasets of scientific papers , Pubmed and arXiv, where it outperforms previous work, both extractive and abstractive models, on ROUGE-1, ROUGE-2 and ME-TEOR scores. We also show that, consistently with our goal, the benefits of our method become stronger as we apply it to longer documents. Rather surprisingly, an ablation study indicates that the benefits of our model seem to come exclusively from modeling the local context, even for the longest documents.

show abstract

Section: Extractive Summarization On Scientific Papersmentioning

confidence: 99%

Extractive Summarization of Long Documents by Combining Global and Local Context

Xiao¹,

Carenini²

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

130

View full text Add to dashboard Cite

show abstract

“…Since record encoders with record fusion gate provide record-level representation and row-level encoder provides row-level representation. Inspired by Cohan et al (2018), we can modify the decoder in base model to first choose important row then attend to records when generating each word. Following notations in Section 2.3, β t,i ∝ exp(score(d t , row i )) obtains the attention weight with respect to each row.…”

Section: Decoder With Dual Attentionmentioning

confidence: 99%

“…In addition, we implemented three other hierarchical encoders that encoded tables' row dimension information in both record-level and row-level to compare with the hierarchical structure of encoder in our model. The decoder was equipped with dual attention (Cohan et al, 2018). The one with LSTM cell is similar to the one in Cohan et al (2018) with 1 layer from {1,2,3}.…”

Section: Automatic Evaluationmentioning

confidence: 99%

See 1 more Smart Citation

Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time)

Gong¹,

Feng²,

Qin³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Although Seq2Seq models for table-to-text generation have achieved remarkable progress, modeling table representation in one dimension is inadequate. This is because (1) the table consists of multiple rows and columns, which means that encoding a table should not depend only on one dimensional sequence or set of records and (2) most of the tables are time series data (e.g. NBA game data, stock market data), which means that the description of the current table may be affected by its historical data. To address aforementioned problems, not only do we model each table cell considering other records in the same row, we also enrich table's representation by modeling each table cell in context of other cells in the same column or with historical (time dimension) data respectively. In addition, we develop a table cell fusion gate to combine representations from row, column and time dimension into one dense vector according to the saliency of each dimension's representation. We evaluated our methods on ROTOWIRE, a benchmark dataset of NBA basketball games. Both automatic and human evaluation results demonstrate the effectiveness of our model with improvement of 2.66 in BLEU over the strong baseline and outperformance of state-of-the-art model.

show abstract

“…1 Introduction is to analyze and understand the impact on the models' generalization ability from a dataset perspective. With the emergence of more and more summarization datasets (Sandhaus, 2008;Nallapati et al, 2016;Cohan et al, 2018;Grusky et al, 2018), the time is ripe for us to bridge the gap between the insufficient understanding of the nature of datasets themselves and the increasing improvement of the learning methods.…”

mentioning

confidence: 99%

A Closer Look at Data Bias in Neural Extractive Summarization Models

Zhong¹,

Wang²,

Liu³

et al. 2019

Proceedings of the 2nd Workshop on New Frontiers in Summarization

View full text Add to dashboard Cite

In this paper, we take stock of the current state of summarization datasets and explore how different factors of datasets influence the generalization behaviour of neural extractive summarization models. Specifically, we first propose several properties of datasets, which matter for the generalization of summarization models. Then we build the connection between priors residing in datasets and model designs, analyzing how different properties of datasets influence the choices of model structure design and training methods. Finally, by taking a typical dataset as an example, we rethink the process of the model design based on the experience of the above analysis. We demonstrate that when we have a deep understanding of the characteristics of datasets, a simple approach can bring significant improvements to the existing stateof-the-art model.

show abstract

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents

Cited by 450 publications

References 29 publications

Extractive Summarization of Long Documents by Combining Global and Local Context

Extractive Summarization of Long Documents by Combining Global and Local Context

Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time)

A Closer Look at Data Bias in Neural Extractive Summarization Models

Contact Info

Product

Resources

About