2021
DOI: 10.3390/app11083509
|View full text |Cite
|
Sign up to set email alerts
|

Learning-Based Dissimilarity for Clustering Categorical Data

Abstract: Comparing data objects is at the heart of machine learning. For continuous data, object dissimilarity is usually taken to be object distance; however, for categorical data, there is no universal agreement, for categories can be ordered in several different ways. Most existing category dissimilarity measures characterize the distance among the values an attribute may take using precisely the number of different values the attribute takes (the attribute space) and the frequency at which they occur. These kinds o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…The linear programming model outperformed the traditional and other enhanced k-modes algorithms on categorical datasets [8]. Learning-Based dissimilarity for categorical data clustering outperforms in terms of several performance indicators [16]. Compared to k-modes, the k-Approximate Modal Haplotype achieves an average performance increase of 0.51 percent in Precision and 0.40 percent in Normalized Discounted Cumulative Gain.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The linear programming model outperformed the traditional and other enhanced k-modes algorithms on categorical datasets [8]. Learning-Based dissimilarity for categorical data clustering outperforms in terms of several performance indicators [16]. Compared to k-modes, the k-Approximate Modal Haplotype achieves an average performance increase of 0.51 percent in Precision and 0.40 percent in Normalized Discounted Cumulative Gain.…”
Section: Related Workmentioning
confidence: 99%
“…In the confusion matrix [16], there were four performance metrics such as true positive (TP), false positive (FP), true negative (TN), and false-negative (FN). TP is the metric that could accurately predict the optimized feature from the features in feature space collection.…”
Section: B Parameter Settingsmentioning
confidence: 99%
“…Our proposed improved K-prototypes with the dissimilarity measures in equation ( 8) is compared with other typical clustering algorithms with diferent dissimilarity measures based on various evaluation criteria of clustering algorithms. Rivera-Ríos [22] summarized a variety of dissimilarity measures. Te dissimilarity measures that can be used to measure the mixed attributes, such as Euclidean distance of K-means++, the dissimilarity measure of K-prototypes, and the dissimilarity measure of equi-biased K-prototypes in [19], are selected to compare with our proposed dissimilarity measure.…”
Section: Performance Evaluation Of the Improved K-prototypesmentioning
confidence: 99%
“…Te performance of the clustering algorithm can be evaluated by external evaluation criteria and internal evaluation criteria [22][23][24]. Te internal evaluation criteria are calculated based on the dissimilarity of objects in clusters, and the dissimilarity measure of several algorithms is different, and accordingly, the internal evaluation results of diferent algorithms will not be on the same scale.…”
Section: Performance Evaluation Of the Improved K-prototypesmentioning
confidence: 99%
“…It is important to note that multivariate situations presenting categorical variables or a mix of categorical and numerical variables have been studied within specific areas, such as the processing of mix-type data and categorical data clustering [ 28 , 29 , 30 ]. However, these tools are applicable to observation points, whereas statistical interaction occurs between variables in any given dataset.…”
Section: Introductionmentioning
confidence: 99%