2022
DOI: 10.11591/ijeecs.v29.i1.pp545-552
|View full text |Cite
|
Sign up to set email alerts
|

Reclust: an efficient clustering algorithm for mixed data based on reclustering and cluster validation

Abstract: <span>Clustering is a significant approach in data mining, which seeks to find groups or clusters of data. Both numeric and categorical features are frequently used to define the data in real-world applications. Several different clustering algorithms are proposed for the numerical and categorical datasets. In clustering algorithms, the quality of clustering results is evaluated using cluster validation. This paper proposes an efficient clustering algorithm for mixed numerical and categorical data using … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…In realworld applications, both numeric and categorical features are often used to define the data. Clustering analysis is one of the most important approaches in DM, and it seeks to find the nature of groupings or clusters of data objects within an attribute space [8,11,16]. For an exploratory approach, we applied clustering analysis to the dataset in Appendix B.…”
Section: Cluster Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…In realworld applications, both numeric and categorical features are often used to define the data. Clustering analysis is one of the most important approaches in DM, and it seeks to find the nature of groupings or clusters of data objects within an attribute space [8,11,16]. For an exploratory approach, we applied clustering analysis to the dataset in Appendix B.…”
Section: Cluster Analysismentioning
confidence: 99%
“…(32), knn (49), svm (81) and rf (69), to the right of the graph characterised by a strongly positive coordinate on the axis, to individuals such as MCDA C (58), characterised by a strongly negative coordinate on the axis (to the left of the graph). Dimension 2 opposes individuals such as lstm (54), word2vec (88), nlp (63) and BIM ( 16), who at the top of the graph, and characterised by a low positive co-ordinate on the axis, with individuals such as ann (8), adaboost (3), who have low negative coordinate on the axis and are located at the bottom of the graph. The Dim1, group 1 (dt , knn, svm and rf) is sharing high values for the variables "predicting", "supervised", "monitoring", "frequency", "institutional data", "data project-simulation-signal", "classifying", "best method and "interview-literature-text" (variables are sorted from the strongest).…”
Section: Inertia Distributionmentioning
confidence: 99%
“…The nodes that make up the output grid accommodate only one class type, but sometimes this does not happen. Therefore, an analysis of cluster purity is conducted to uniquely assign a single class to each cell in the map [49]. Purity is a metric for how much a cluster contains a single class (Equation ( 21)).…”
Section: Quality Of Self-organizing Mapmentioning
confidence: 99%
“…Algoritma K-Means bekerja pada atribut numerik dan juga mempartisi data ke sejumlah kelompok [20]. Algoritma K-Means dimulai dengan memilih angka K secara acak serta pengambilan sebagian populasi sejumlah K untuk dijadikan sebagai titik pusat awal [21].…”
Section: K-meansunclassified