2018
DOI: 10.1177/0165551518816302
|View full text |Cite
|
Sign up to set email alerts
|

DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering

Abstract: In this article, a new initial centroid selection for a K-means document clustering algorithm, namely, Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means (DIC-DOC- K-means), to improve the performance of text document clustering is proposed. The first centroid is the document having the minimum standard deviation of its term frequency. Each of the other subsequent centroids is selected based on the dissimilarities of the previously selected centroids. For comparing the perform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(14 citation statements)
references
References 32 publications
(54 reference statements)
0
12
0
Order By: Relevance
“…In this paper, two essential changes have been applied to PCKmeans: (i) in the section of initialization and (ii) in the section of calculating centers of clusters. Furthermore, the penalty of violation from constraints is created automatically [12,47].…”
Section: Methodsmentioning
confidence: 99%
“…In this paper, two essential changes have been applied to PCKmeans: (i) in the section of initialization and (ii) in the section of calculating centers of clusters. Furthermore, the penalty of violation from constraints is created automatically [12,47].…”
Section: Methodsmentioning
confidence: 99%
“…The Euclidean distance is used to calculate the distance between other samples and the cluster center, and the sample points are grouped into the class with the closest distance to the cluster center. Then, the mean value of each class is used as the new clustering center, and the samples are re-classified into k classes [48,49]. Thus, iterative calculations are performed until the cluster centroids no longer change.…”
Section: Evaluation Of Uniformity Based On Cluster Analysis 231 Clustering Analysis Algorithmmentioning
confidence: 99%
“…It is compared with the existing U-K mean method. Lakshmi and Baskar [22] proposed a new initial centroid selection method of K-means document clustering algorithm, namely, DIC doc-K-means initial centroid selection based on dissimilarity, to improve the performance of text document clustering.…”
Section: Introductionmentioning
confidence: 99%