2021
DOI: 10.22266/ijies2021.0430.05
|View full text |Cite
|
Sign up to set email alerts
|

A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application

Abstract: Clustering plays a major role in the data mining application, because it divides and groups the data effectively. In the pattern analysis, two major challenges occur in real-life applications that includes handling the categorical data and the availability of correctly labeled data. According to the characteristics of homogeneity, the clustering techniques are designed to group the unlabeled data. Some important issues such as high memory utilization, time consumption, overhead, computation complexity and less… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 20 publications
0
5
0
Order By: Relevance
“…However, they differ in their approach to clustering. K-means clustering tries to partition the data into k clusters by minimizing the sum of squared distances between the data points and their assigned cluster centers 35,36 37,38 . In other words, spectral clustering looks at the relationships between data points, rather than their distance in the feature space.…”
Section: Driving Behavior Clusteringmentioning
confidence: 99%
“…However, they differ in their approach to clustering. K-means clustering tries to partition the data into k clusters by minimizing the sum of squared distances between the data points and their assigned cluster centers 35,36 37,38 . In other words, spectral clustering looks at the relationships between data points, rather than their distance in the feature space.…”
Section: Driving Behavior Clusteringmentioning
confidence: 99%
“…Selection of good initial cluster centers results in better performance of the clustering algorithm and poor selection of the initial cluster centers results in the worst performance of the clustering algorithm. The authors in [10] proposed the INCK algorithm, an improved version of the Kmedoids clustering algorithm. Instead of a random selection of the initial medoids, the INCK algorithm selects the initial medoids from a set of chosen data objects that is free from noise and outliers.…”
Section: • Determining the Optimal Number Of Clusters • Handling High...mentioning
confidence: 99%
“…Handling categorical data: This is accomplished by mapping the categorical data with the numbers 0 and 1 (Customer Type, Travel Type, Satisfaction) using ordinal encoding [44].…”
Section: Step 3: Applying Feature Engineeringmentioning
confidence: 99%