In recent years, Text Mining wasan important topic because of the growth of digital text data from many sources such as government document, Email, Social Media, Website, etc. The English poemsare one of the text data to categorization English Poems will use Text categorization, Text categorization is a method in which classify documents into one or more categories that were predefined the category based on the text content in a document .In this paper we will solve the problem of how to categorize the English poem into one of the English Poems categorizations by using text mining technique and Machine learning algorithm, Our data set consist of seven categorizations for poems the data set is divided into two-part training (learning)and testing data. In the proposed model we apply the text preprocessing for the documents file to reduce the number of feature and reduce dimensionality the preprocessing process converts the text poem to features and remove the irrelevant feature by using text mining process (tokenize,remove stop word and stemming), to reduce the feature vector of the remaining feature we usetwo methods for feature selection and use Rough set theory as machine learning algorithm to perform the categorization, and we get 88% success classification of the proposed model.
Owning to the huge amounts of data collected in database, cluster analysis has recently become a highly active topic in data mining research. In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large database. This paper proposes two new partitioning cluster methods, first is modified k-mean clustering algorithm with variable Neighborhood Search as a metaheuristic search and the second is modified k-mean clustering algorithm with cuckoo search as swarm intelligence.The proposed algorithms does not need to enter the value of cluster points, instead of that it finds it automatically to get the best clustering using the clustering validity. This represents its fundamental characteristic.The experiments were made on a many different sizes of databases some of the obtained from University of California (UC) Irvine Machine Learning Repository which maintain 246 data sets as a service to the machine learning community.From these experiments, it is concluded that these methods reduced the time which needed to get the best solution as a half time which needed to perform same actions and in the same time it reduced the iterations to get the best solution. In addition, these proposed clustering methods give best quality (as performance) compared with other clustering methods; the performance was improved between (10% -20%) compared with the original k-mean clustering method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.