Abstract:By researching all kinds of methods for document clustering, we put forward a new dynamic method based on genetic algorithm (GA). K-means is a greedy algorithm, which is sensitive to the choice of cluster center and very easily results in local optimization. Genetic algorithm is a global convergence algorithm, which can find the best cluster centers easily. Among the traditional document clustering methods, the document similar matrix is a sparse matrix. In this paper, we propose some new formulas improved on … Show more
“…Finally, since the objective function is the most distinguished portion of evolutionary algorithm, a summary of all fitness functions adopted for all disciplines is going be discuss in the next section (section 5). Wei et al (2009) put forward a new dynamic method based on GA for document clustering. The method established on a new formula for describing the similarities of Chinese text documents.…”
Section: Fig 5 Main Researches' Disciplines In Document Clustering mentioning
confidence: 99%
“…Additionally, most of these researches dealt with the problem as a maximization problem, except in (Wei et al, 2009) and 2010;2012) because the intraclustering and BIC are minimization in its nature. While in and Lee et al, 2011;Lee and Park, 2012;Song and Park, 2006;2007a;2007b), the researchers adopted the inverse of the DB index to convert the problem into a maximization problem.…”
Section: The Objective Functions Used In Document Clusteringmentioning
Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research work in this topic. Finally, it compiles and classifies various objective functions, the core of the evolutionary algorithms, from the related collection of research papers. The paper ends up by addressing some important issues and challenges that can be subject of future work.
“…Finally, since the objective function is the most distinguished portion of evolutionary algorithm, a summary of all fitness functions adopted for all disciplines is going be discuss in the next section (section 5). Wei et al (2009) put forward a new dynamic method based on GA for document clustering. The method established on a new formula for describing the similarities of Chinese text documents.…”
Section: Fig 5 Main Researches' Disciplines In Document Clustering mentioning
confidence: 99%
“…Additionally, most of these researches dealt with the problem as a maximization problem, except in (Wei et al, 2009) and 2010;2012) because the intraclustering and BIC are minimization in its nature. While in and Lee et al, 2011;Lee and Park, 2012;Song and Park, 2006;2007a;2007b), the researchers adopted the inverse of the DB index to convert the problem into a maximization problem.…”
Section: The Objective Functions Used In Document Clusteringmentioning
Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research work in this topic. Finally, it compiles and classifies various objective functions, the core of the evolutionary algorithms, from the related collection of research papers. The paper ends up by addressing some important issues and challenges that can be subject of future work.
“…E.g. if two chromosomes of 5 centers are (1,4,6,7,9) and (5,11,10,8,1). First find out common center i.e.…”
Section: Crossover Operator Of Ga and Ddementioning
confidence: 99%
“…E.g. if one chromosome of 5 centers is (1,4,6,7,9) and we want to update second gene's value, so we will replace 4 by value v ϵ {1, 2, … , n}-{1, 4, 6, 7, 9}.…”
Section: Mutation Operator Of Ga and Ddementioning
confidence: 99%
“…Wei Jian-Xiang [1] introduces that clustering algorithms can be broadly divided into two basic categories: hierarchical and non-hierarchical. K-means is a most widely used algorithm.…”
Clustering in data mining is a discovery process that groups a set of documents such that documents within a cluster have high similarity while documents in different clusters have low similarity. Existing clustering method like K-means is a popular method but its results are based on choice of cluster centers so it easily results in local optimization. Genetic Algorithm (GA) is an optimization method which can be applied for finding out the best cluster centers easily. But sometimes it takes more iteration for finding best cluster centers. In this paper, we use features of GA with the features of Discrete Differential Evolution (DDE) to solve text documents clustering problem. To test the efficiency of our algorithm we have taken sample database of Reuters-21578. From the experimental results, it is clear that our algorithm performs better than GA and DDE.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.