QCS: a system for querying, clustering and summarizing documents.

Dunlavy, Daniel M.; Schlesinger, Judith D.; O’Leary, Dianne P.; Conroy, John M.

doi:10.2172/894745

Cited by 12 publications

(23 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In this section, we compare the summary performances of MOGAs with those of other five methods, such as CRF (Shen et al, 2007), Manifold-Ranking (Wan et al, 2007), NetSum (Svore et al, 2007), QCS (Dunlavy et al, 2007), and SVM (Yeh et al, 2004) which are widely used in the automatic document summarization. Table 1 and Table 2 show the results of all the methods in terms F-measure, ROUGE-1 and ROUGE-2 metrics on DUC01 and DUC02 datasets, respectively.…”

Section: Performance and Discussionmentioning

confidence: 99%

Generation of Non-redundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2

Lee

Hahm²,

Chang³

et al. 2012

Proceedings of the 4th International Joint Conference on Computational Intelligence

View full text Add to dashboard Cite

In this paper, automatic document summarization using the sentence clustering algorithms, NSGA-II and SPEA2, is proposed. These algorithms are very effective to extract the most important and non-redundant sentences from a document. Using these, we cluster similar sentences as many groups as we need and extract the most important sentence in each group. After clustering, we rearrange the extracted sentences in the same order as in the document to generate readable summary. We tested this technique with two of the open benchmark datasets, DUC01 and DUC02. To evaluate the performances, we used F-measure and ROUGE. The experimental results show the performances of these MOGAs, NSGA-II and SPEA2, are better than those of the existing algorithms.

show abstract

Section: Performance and Discussionmentioning

confidence: 99%

Generation of Non-redundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2

Lee

Hahm²,

Chang³

et al. 2012

Proceedings of the 4th International Joint Conference on Computational Intelligence

View full text Add to dashboard Cite

show abstract

“…In this section, the performance of our method is compared with other well‐known or recently proposed methods. Comparison of the proposed method was made against the following methods: (a) DPSO‐EDASum (optimization approach based on discrete PSO and EDA; Alguliev, Aliguliyev, & Mehdiyev, ), (b) LexRank (graph‐based approach; Erkan & Radev, ), (c) CollabSum (clustering and graph‐ranking based approach; Wan et al, ), (d) UnifiedRank (graph‐based approach; Wan, ), (e) 0–1 non‐linear (binary optimization based on discrete PSO approach; Alguliev, Aliguliyev, & Isazade, ), (f) QCS (machine learning approach based on hidden Markov model; Dunlavy et al, ), (g) SVM (algebraic approach; Yeh et al, ), (h) FEOM (fuzzy evolutionary approach; Song et al, ), (i) CRF (machine learning approach based on CRF; Shen et al, ), (j) MA‐SingleDocSum (metaheuristic approach based on genetic operators and guided local search; Mendoza et al, ), (k) NetSum (machine learning approach based on neural nets; Svore et al, ), (l) manifold ranking (probabilistic approach using greedy algorithm; Wan et al, ), (m) ESDS‐GHS‐GLO (binary optimization based on the global‐best harmony search heuristic, a greedy local search algorithm; Mendoza et al, ), and (n) DE (clustering and metaheuristic based approach; Aliguliyev, ). These methods have been chosen for comparison because they have achieved the best results on the DUC2001 and DUC2002 data sets.…”

Section: Methodsmentioning

confidence: 99%

COSUM: Text summarization based on clustering and optimization

et al. 2018

View full text Add to dashboard Cite

Text summarization is a process of extracting salient information from a source text and presenting that information to the user in a condensed form while preserving its main content. In the text summarization, most of the difficult problems are providing wide topic coverage and diversity in a summary. Research based on clustering, optimization, and evolutionary algorithm for text summarization has recently shown good results, making this a promising area. In this paper, for a text summarization, a two‐stage sentences selection model based on clustering and optimization techniques, called COSUM, is proposed. At the first stage, to discover all topics in a text, the sentences set is clustered by using k‐means method. At the second stage, for selection of salient sentences from clusters, an optimization model is proposed. This model optimizes an objective function that expressed as a harmonic mean of the objective functions enforcing the coverage and diversity of the selected sentences in the summary. To provide readability of a summary, this model also controls length of sentences selected in the candidate summary. For solving the optimization problem, an adaptive differential evolution algorithm with novel mutation strategy is developed. The method COSUM was compared with the 14 state‐of‐the‐art methods: DPSO‐EDASum; LexRank; CollabSum; UnifiedRank; 0–1 non‐linear; query, cluster, summarize; support vector machine; fuzzy evolutionary optimization model; conditional random fields; MA‐SingleDocSum; NetSum; manifold ranking; ESDS‐GHS‐GLO; and differential evolution, using ROUGE tool kit on the DUC2001 and DUC2002 data sets. Experimental results demonstrated that COSUM outperforms the state‐of‐the‐art methods in terms of ROUGE‐1 and ROUGE‐2 measures.

show abstract

“…Dunlavy et al 7 rely on a Hidden Markov Model (HMM) to create the summary of a document, which consists of the top-N sentences with the highest probability values of features computed using the HMM. The features used in the HMM include (i) the number of signature terms in a sentence, i.e., terms that are more likely to occur in a given document rather than in the collection to which the document belongs, (ii) the number of subject terms, i.e., signature terms that occur in headline or subject leading sentences, and (iii) the position of the sentence in the document.…”

Section: Comparing the Performance Of Corsum(-sf )mentioning

confidence: 99%

“…The features used in the HMM include (i) the number of signature terms in a sentence, i.e., terms that are more likely to occur in a given document rather than in the collection to which the document belongs, (ii) the number of subject terms, i.e., signature terms that occur in headline or subject leading sentences, and (iii) the position of the sentence in the document. Since the HMM tends to select longer sentences to be included in a summary, 7 sentences are trimmed by removing lead adverbs and conjunctions, gerund phrases, and restricted relative-clause noun phrases.…”

Section: Comparing the Performance Of Corsum(-sf )mentioning

confidence: 99%

A Naïve Bayes Classifier for Web Document Summaries Created by Using Word Similarity and Significant Factors

Pera

2010

Int. J. Artif. Intell. Tools

View full text Add to dashboard Cite

Text classification categorizes web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified web documents to identify the ones with contents that satisfy their information needs. In solving this problem, we first introduce CorSum, an extractive single-document summarization approach, which is simple and effective in performing the summarization task, since it only relies on word similarity to generate high-quality summaries. We further enhance CorSum by considering the significance factor of sentences in documents, in addition to using word-correlation factors, for document summarization. We denote the enhanced approach CorSum-SF and use the summaries generated by CorSum-SF to train a Multinomial Naïve Bayes classifier for categorizing web document summaries into predefined classes. Experimental results on the DUC-2002 and 20 Newsgroups datasets show that CorSum-SF outperforms other extractive summarization methods, and classification time (accuracy, respectively) is significantly reduced (compatible, respectively) using CorSum-SF generated summaries compared with using the entire documents. More importantly, browsing summaries, instead of entire documents, which are assigned to predefined categories, facilitates the information search process on the Web.

show abstract

QCS: a system for querying, clustering and summarizing documents.

Cited by 12 publications

References 4 publications

Generation of Non-redundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2

Generation of Non-redundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2

COSUM: Text summarization based on clustering and optimization

A Naïve Bayes Classifier for Web Document Summaries Created by Using Word Similarity and Significant Factors

Contact Info

Product

Resources

About