A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application

Kumar, Pradeep; Kanavalli, Anita

doi:10.22266/ijies2021.0430.05

Cited by 9 publications

(5 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, they differ in their approach to clustering. K-means clustering tries to partition the data into k clusters by minimizing the sum of squared distances between the data points and their assigned cluster centers 35,36 37,38 . In other words, spectral clustering looks at the relationships between data points, rather than their distance in the feature space.…”

Section: Driving Behavior Clusteringmentioning

confidence: 99%

Clustering and Investigation of Human Driving Behavior with Shoe Type in Urban Roads using Autoencoder

Shin,

Myoung,

Jeon

et al. 2024

Preprint

View full text Add to dashboard Cite

This paper describes clustering and investigation of human driving behavior with shoe type in urban roads using autoencoder. The analysis of driving patterns for different shoe types is important as it has been known to affect safe driving due to the braking distance change. To this end, we analyzed the effect of the type of shoes on vehicle driving by using an autoencoder to train the data acquired from actual vehicle driving with two types of shoes. With successfully trained data, the driving characteristics have been clustered on various urban driving scenarios when a preceding vehicle exists. The validity and impact of the clustering results on safe driving were verified through collision risk analysis in order to investigate the safety effects of shoe types. It has been shown from vehicle tests that the proposed clustering analysis with probabilistic risk assessment presents a clear correlation between footwear choice and driving safety, with the maximum collision probability was reduced by 23%, and the maximum collision time was improved by 0.4 seconds when driving shoes were worn.

show abstract

Section: Driving Behavior Clusteringmentioning

confidence: 99%

Clustering and Investigation of Human Driving Behavior with Shoe Type in Urban Roads using Autoencoder

Shin,

Myoung,

Jeon

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Selection of good initial cluster centers results in better performance of the clustering algorithm and poor selection of the initial cluster centers results in the worst performance of the clustering algorithm. The authors in [10] proposed the INCK algorithm, an improved version of the Kmedoids clustering algorithm. Instead of a random selection of the initial medoids, the INCK algorithm selects the initial medoids from a set of chosen data objects that is free from noise and outliers.…”

Section: • Determining the Optimal Number Of Clusters • Handling High...mentioning

confidence: 99%

ImpClust: An Algorithm to Cluster Chemical Datasets for Drug Discovery

2024

IJIES

View full text Add to dashboard Cite

Data clustering, an unsupervised machine learning technique, plays a critical part in the process of drug discovery in chemoinformatics. Researchers have come up with numerous clustering algorithms over the past decades that are well suited to analyze large chemical datasets of high dimensionality. The applications of clustering algorithms can be seen in lead compound selection which is the process of identifying the chemical compound that helps in the treatment of disease and results in the development of a new drug in the drug discovery process. The quantitative structure-property relationship (QSPR) in the drug discovery process identifies the compounds having similar properties using clustering algorithms over the structural descriptors of the chemical compounds. The quantitative structure-activity relationship (QSAR) process uses cluster analysis to identify the empirical relationships between the chemical structure and biological activities among similar compounds. The acute toxicity of the chemical compound is controlled by the chemists in the drug discovery process using cluster analysis. Considering the numerous applications of data clustering in the drug discovery process, in this paper, an improved clustering algorithm ImpClust is proposed to cluster similar compounds based on chemical composition. Five benchmark datasets are considered to evaluate the performance of the proposed ImpClust algorithm. The experimental results obtained are compared with the five commonly used clustering algorithms. A total of five cluster validation indexes (DI-Index, COP-Index, DB-Index, CH-Index and Silhouette Index) are used to evaluate the clusters formed utilizing the different clustering algorithms. The experimental findings show that the proposed ImpClust algorithm achieves a significantly high score for Silhouette Index, DI-Index, and CH-Index whereas for COP-Index and DB-Index the proposed ImpClust algorithm achieves a significantly low score in comparison to the five existing clustering techniques.

show abstract

“…Handling categorical data: This is accomplished by mapping the categorical data with the numbers 0 and 1 (Customer Type, Travel Type, Satisfaction) using ordinal encoding [44].…”

Section: Step 3: Applying Feature Engineeringmentioning

confidence: 99%

An Optimized Deep Learning Approach for Improving Airline Services

Ouf¹

2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

The aviation industry is one of the most competitive markets. The most common approach for airline service providers is to improve passenger satisfaction. Passenger satisfaction in the aviation industry occurs when passengers' expectations are met during flights. Airline service quality is critical in attracting new passengers and retaining existing ones. It is crucial to identify passengers' pain points and enhance their satisfaction with the services offered. The airlines used a variety of techniques to improve service quality. They used data analysis approaches to analyze the passenger point data. These solutions have focused simply on surveys; consequently, deeplearning approaches have received insufficient attention. In this study, deep neural networks with the adaptive moment estimation Adam optimization algorithm were applied to enhance classification performance. In previous studies, the quality of the dataset has been ignored. The proposed approach was applied to the airline passenger satisfaction dataset from the Kaggle repository. It was validated by applying artificial neural networks (ANNs), random forests, and support vector machine techniques to the same dataset. It was compared with other research papers that used the same dataset and had a similar problem. The experimental results showed that the proposed approach outperformed previous studies. It has achieved an accuracy of 99.3%.

show abstract

A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application

Cited by 9 publications

References 20 publications

Clustering and Investigation of Human Driving Behavior with Shoe Type in Urban Roads using Autoencoder

Clustering and Investigation of Human Driving Behavior with Shoe Type in Urban Roads using Autoencoder

ImpClust: An Algorithm to Cluster Chemical Datasets for Drug Discovery

An Optimized Deep Learning Approach for Improving Airline Services

Contact Info

Product

Resources

About