2019
DOI: 10.11591/ijece.v9i2.pp1313-1320
|View full text |Cite
|
Sign up to set email alerts
|

Feature selection, optimization and clustering strategies of text documents

Abstract: Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…In this study, we used the result of the calculation of cosine similarity [17] to represent the degree of message similarity. Cosine similarity [18][19] is the traditional method used to measure the degree of similarity between two vectors, obtained from the cosine angle multiplication. The Cosine similarity [8] can be calculated using term frequency and inverse document frequency (TF-IDF) formulas.…”
Section: The Similarity Of Chat Messagesmentioning
confidence: 99%
“…In this study, we used the result of the calculation of cosine similarity [17] to represent the degree of message similarity. Cosine similarity [18][19] is the traditional method used to measure the degree of similarity between two vectors, obtained from the cosine angle multiplication. The Cosine similarity [8] can be calculated using term frequency and inverse document frequency (TF-IDF) formulas.…”
Section: The Similarity Of Chat Messagesmentioning
confidence: 99%
“…Feature selection approach try to find a subset of the originalvariables (also called attributes or features). In this process three different strategies can be used one is filter for information gain, wrapper is used for accuracy and embedded is used to add or remove while constructing the model based on the predicted errors [11] . In some data analysis cases such as classification or regression can be done in the reduced space more exactly than the original data space.…”
Section: Feature Selectionmentioning
confidence: 99%
“…To measure the distance between two points Euclidean distance metric is took the major role, at the same time easily measure the data by using ruler for two and three dimensional spaces also. Sometimes Euclidean will also be selected in clustering [11] .…”
Section: Euclidean Distancementioning
confidence: 99%
“…Objects are similar inside the same cluster whereas dissimilar compared to objects descending from other clusters. Clustering, as a class of unsupervised classification method, has been widely applied in different domains, machine learning, image segmentation, pattern recognition, text mining and many other domains [1][2][3]. Great number of clustering algorithms lie in literature, the famous K-mean clustering [4], hierarchical clustering [5], k-medoids [6], and mean shift [7] have been considered in various problems.…”
Section: Introductionmentioning
confidence: 99%