Query-focused multi-document summarization: automatic data annotations and supervised learning approaches

Chali, Yllias; Hasan, Sadid A.

doi:10.1017/s1351324911000167

Cited by 27 publications

(22 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(ii ) solving an optimization problem [20,8,27]: these approaches cast the summarization problem as an optimization problem where an objective function needs to be optimized with respect to some constraints. (iii ) supervised models [60,25,18], where selection of sentences in the summary are learned using a supervised framework. (iv ) graph based [29,53,62]: these approaches seek to find the most central sentences in a document's graph where sentences are nodes and edges are similarities.…”

Section: Text Summarizationmentioning

confidence: 99%

Scientific document summarization via citation contextualization and scientific discourse

Cohan

Goharian

2017

Int J Digit Libr

View full text Add to dashboard Cite

The rapid growth of scientific literature has made it difficult for the researchers to quickly learn about the developments in their respective fields. Scientific document summarization addresses this challenge by providing summaries of the important contributions of scientific papers. We present a framework for scientific summarization which takes advantage of the citations and the scientific discourse structure. Citation texts often lack the evidence and context to support the content of the cited paper and are even sometimes inaccurate. We first address the problem of inaccuracy of the citation texts by finding the relevant context from the cited paper. We propose three approaches for contextualizing citations which are based on query reformulation, word embeddings, and supervised learning. We then train a model to identify the discourse facets for each citation. We finally propose a method for summarizing scientific papers by leveraging the faceted citations and their corresponding contexts. We evaluate our proposed method on two scientific summarization datasets in the biomedical and computational linguistics domains. Extensive evaluation results show that our methods can improve over the state of the art by large margins. * This is a pre-print of an article published on IJDL. The final publication is available at Springer via http://dx.

show abstract

Section: Text Summarizationmentioning

confidence: 99%

Scientific document summarization via citation contextualization and scientific discourse

Cohan

Goharian

2017

Int J Digit Libr

View full text Add to dashboard Cite

show abstract

“…Up to now, various extraction-based techniques have been proposed for generic document summarization. 28 In automatic document summarization, the selection process of the distinct ideas included in the document is called diversity. The diversity is very important evidence serving to control the redundancy in the summarized text and produce more appropriate summary.…”

Section: Related Workmentioning

confidence: 99%

MR&MR-SUM: MAXIMUM RELEVANCE AND MINIMUM REDUNDANCY DOCUMENT SUMMARIZATION MODEL

Alguliyev

Alıguliyev

Isazade

2013

Int. J. Info. Tech. Dec. Mak.

View full text Add to dashboard Cite

We have presented an approach to automatic document summarization. In the proposed approach, text summarization is modeled as a quadratic integer-programming problem. This model generally attempts to optimize three properties, namely, (1) relevance: summary should contain informative textual units that are relevant to the user; (2) redundancy: summaries should not contain multiple textual units that convey the same information; and (3) length: summary is bounded in length. To solve the optimization problem we have created a novel di®erential evolution algorithm. Experimental results on DUC2005 and DUC2007 data sets showed that the proposed approach outperforms the other methods.

show abstract

“…We use query-focused supervised extractive multi-document summarization technique for this purpose [1][2][3]. Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions [4].…”

Section: Introductionmentioning

confidence: 99%

“…Supervised classifiers are typically trained on data pairs, defined by feature vectors and corresponding class labels. We use an automatic labeling approach to annotate the training data using ROUGE [1,3,9]. From each sentence of the training (and testing) data, we extract different query-related features and importance-oriented features such as: n-gram overlap, Longest Common Subsequence (LCS), Weighted LCS (WLCS), skip-bigram, exact word overlap, synonym overlap, hypernym/hyponym overlap, gloss overlap, Basic Element (BE) overlap, syntactic tree similarity measure, position of sentences, length of sentences, Named Entity (NE) match, cue word match and title match [1,3,5,13].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Complex Question Answering: Homogeneous or Heterogeneous, Which Ensemble Is Better?

Chali

Hasan

Mojahid

2014

Natural Language Processing and Information Systems

Self Cite

View full text Add to dashboard Cite

OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. Abstract. This paper applies homogeneous and heterogeneous ensembles to perform the complex question answering task. For the homogeneous ensemble, we employ Support Vector Machines (SVM) as the learning algorithm and use a Cross-Validation Committees (CVC) approach to form several base models. We use SVM, Hidden Markov Models (HMM), Conditional Random Fields (CRF), and Maximum Entropy (MaxEnt) techniques to build different base models for the heterogeneous ensemble. Experimental analyses demonstrate that both ensemble methods outperform conventional systems and heterogeneous ensemble is better.

show abstract

Query-focused multi-document summarization: automatic data annotations and supervised learning approaches

Cited by 27 publications

References 29 publications

Scientific document summarization via citation contextualization and scientific discourse

Scientific document summarization via citation contextualization and scientific discourse

MR&MR-SUM: MAXIMUM RELEVANCE AND MINIMUM REDUNDANCY DOCUMENT SUMMARIZATION MODEL

Complex Question Answering: Homogeneous or Heterogeneous, Which Ensemble Is Better?

Contact Info

Product

Resources

About