Extracting Multi-document Summaries with a Double Clustering Approach

Silveira, Sara; Branco, António

doi:10.1007/978-3-642-31178-9_7

Cited by 9 publications

(10 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The goal of this module is to produce summaries from a certain number of texts of the same topic. This is an adaptation of the work presented in [19], with the difference that the module is designed to be language-independent. After a preprocessing stage that uses the NLP module, the process collects all the recognized named entities and keywords of the text, using the wellknown Term Frequency-Inverse Document Frequency (TF-IDF) algorithm [20] for this second task.…”

Section: A Language Unitmentioning

confidence: 98%

Hypatia: An expert system proposal for documentation departments

Garrido

Peiró

Ilarri

2014

2014 IEEE 12th International Symposium on Intelligent Systems and Informatics (SISY)

View full text Add to dashboard Cite

Nowadays, the vast amount of text-based information stored in organizations requires different approaches and new tools in order to manage it adequately. This paper presents Hypatia, a support expert system for documentation departments and regular users that exploits not only local information, but also external resources from the Web (e.g., Linked Data). The expert system uses different modules: Natural Language Processing (NLP) analysis, categorization, semantic disambiguation, Automatic Query Expansion (AQE), semantic search, summarization, knowledge extraction, and aggregation. Users can interact with the expert system in different ways, varying from giving very specific orders to writing a simple list of keywords. The latter method requires a previous interpretation before deciding the response of the system. The obtained results will benefit from semantic links referencing complementary data to improve both the information presentation and the data navigation.

show abstract

Section: A Language Unitmentioning

confidence: 98%

Hypatia: An expert system proposal for documentation departments

Garrido

Peiró

Ilarri

2014

2014 IEEE 12th International Symposium on Intelligent Systems and Informatics (SISY)

View full text Add to dashboard Cite

show abstract

“…Due to our interest in working with multi-documents, we have analyzed other similar works, for example [11], that use domainindependent techniques based mainly on fast statistical processing, a metric for reducing redundancy and maximizing diversity in the selected passages or that use a cluster centroid with techniques such as graph matching, maximal marginal relevance, and language generation. Also we find very interesting the recent contributions of SIMBA [12], which has a smart procedure to simplify sentences to ensure the compression of the summary. To carry out this task, SIMBA applies a two-stage process of clusterization: clustering sentences by similarity and clustering sentences by keyword.…”

Section: Automatic Summariesmentioning

confidence: 98%

TM-Gen: A Topic Map Generator from Text Documents

Garrido

Buey

Escudero

et al. 2013

2013 IEEE 25th International Conference on Tools With Artificial Intelligence

View full text Add to dashboard Cite

The vast amount of text documents stored in digital format is growing at a frantic rhythm each day. Therefore, tools able to find accurate information by searching in natural language information repositories are gaining great interest in recent years. In this context, there are especially interesting tools capable of dealing with large amounts of text information and deriving human-readable summaries. However, one step further is to be able not only to summarize, but to extract the knowledge stored in those texts, and even represent it graphically.In this paper we present an architecture to generate automatically a conceptual representation of knowledge stored in a set of text-based documents. For this purpose we have used the topic maps standard and we have developed a method that combines text mining, statistics, linguistic tools, and semantics to obtain a graphical representation of the information contained therein, which can be coded using a knowledge representation language such as RDF or OWL. The procedure is language-independent, fully automatic, self-adjusting, and it does not need manual configuration by the user. Although the validation of a graphic knowledge representation system is very subjective, we have been able to take advantage of an intermediate product of the process to make an experimental validation of our proposal.

show abstract

“…They further extended bimixture PLSA to incorporate the sentence information, and proposed bimixture PLSA with sentence bases (Bi-PLSAS) to simultaneously cluster and summarize the documents utilizing the mutual influence of the document clustering and summarization procedures. Silveira and Branco [2012] proposed a method for extractive multi-document summarization that explores a two-phase clustering approach. Zhang et al [2012] proposed to rank sentences from a document by exploiting the mutual effects between terms, sentences, and clusters.…”

Section: Related Workmentioning

confidence: 99%

Combining co-clustering with noise detection for theme-based summarization

Cai

Zhang

2013

ACM Trans. Speech Lang. Process.

View full text Add to dashboard Cite

To overcome the fact that the length of sentences is short and their content is limited, we regard words as independent text objects rather than features of sentences in sentence clustering and develop two coclustering frameworks, namely integrated clustering and interactive clustering, to cluster sentences and words simultaneously. Since real-world datasets always contain noise, we incorporate noise detection and removal to enhance clustering of sentences and words. Meanwhile, a semisupervised approach is explored to incorporate the query information (and the sentence information in early document sets) in themebased summarization. Thorough experimental studies are conducted. When evaluated on the DUC2005-2007 datasets and TAC 2008-2009 datasets, the performance of the two noise-detecting co-clustering approaches is comparable with that of the top three systems. The results also demonstrate that the interactive with noise detection algorithm is more effective than the noise-detecting integrated algorithm.

show abstract

Extracting Multi-document Summaries with a Double Clustering Approach

Cited by 9 publications

References 11 publications

Hypatia: An expert system proposal for documentation departments

Hypatia: An expert system proposal for documentation departments

TM-Gen: A Topic Map Generator from Text Documents

Combining co-clustering with noise detection for theme-based summarization

Contact Info

Product

Resources

About