A novel K-means based clustering algorithm for big data

Sinha, Ankita; Jana, Prasanta K.

doi:10.1109/icacci.2016.7732323

Cited by 15 publications

(7 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A parallel implementation of k means algorithm over spark is proposed in [36] large scale text and UCI datasets. In another paper, the authors addressed the issue of predetermining the number of input clusters which is a present problem in most K-means methods by automating the number of input clusters which resulted in better clustering quality when processing large scale data [37].…”

Section: A4 Scalable Methodsmentioning

confidence: 99%

Peer Review #3 of "Big data clustering techniques based on Spark: a literature review (v0.1)"

2020

View full text Add to dashboard Cite

Section: A4 Scalable Methodsmentioning

confidence: 99%

Peer Review #3 of "Big data clustering techniques based on Spark: a literature review (v0.1)"

2020

View full text Add to dashboard Cite

“…In addition, authors proposed a cluster pruning concept to augment K-Means algorithm to reduce clusters to reduce search space for further computation. A similar effort was made by Sinha and Jana [17] who focused on performing automated cluster formation to cope up with Big Data analytics problems. Considering significance of distance metric in K-Means clustering, Niu [9] applied block function which collects instances as blocks to cluster attributes.…”

Section: Introductionmentioning

confidence: 98%

“…have been developed. However, most of these algorithms employ fixed stopping criteria that forces algorithm to undergo huge computational overheads and time consumption [17], [18]. Here the Author [31] said that Hadoop solves the main problem of processing and storage.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Enhanced Evolutionary Computing Assisted K-Means Clustering Algorithm for BigData Analytics

Sarada¹

2020

IJATCSE

View full text Add to dashboard Cite

The exponential rise in internet technologies and allied applications has given rise to the technology named Big Data that intend to process gigantically huge data to assist real-time analytics or decision purposes. However, high pace increasing data heterogeneity, non-linearity, multi-dimensional features and unannotated data characteristics forces classical approaches to undergo huge computational overheads and limited accuracy that confines its suitability for major Big Data analytics purposes. With this motivation, in this paper a robust Big Data analytics model has been developed by incorporating Min-Max normalization, Dual Phased Feature Selection (DPFS) and enhanced Adaptive Genetic Algorithm (AGA) assisted K-Means clustering. Here, the use of Min-Max normalization helps alleviating key issues like data heterogeneity, data imbalance and pre-mature convergence during computation. Unlike classical feature selection approaches, DPFS exploited the efficacy of both Pearson correlation assisted significant test as well as T-Test analysis that ensure optimal feature selection for further computation. In addition, the use of AGA assisted K-Means clustering algorithm has accomplished computationally efficient and reliable clustering for efficient Big Data analytics purposes. Noticeably, the use of adaptive fitness sensitive GA parameter selection has strengthened our proposed system to exhibit better performance without imposing computational overheads. The computational efficacy of AGA-K-Means can strengthen MapReduce to be used for real-time Big Data analytics applications.

show abstract

“…K-DBSCAN: An improved DBSCAN algorithm for big data In [28], presented in 2016, the dataset was divided into smaller parts and distributed to several nodes in a cluster of machines. Apache Hadoop was used as a scalable, powerful platform for this purpose.…”

mentioning

confidence: 99%

K-DBSCAN: An improved DBSCAN algorithm for big data

2020

View full text Add to dashboard Cite

Big data storage and processing are among the most important challenges now. Among data mining algorithms, DBSCAN is a common clustering method. One of the most important drawbacks of this algorithm is its low execution speed. This study aims to accelerate the DBSCAN execution speed so that the algorithm can respond to big datasets in an acceptable period of time. To overcome the problem, an initial grouping was applied to the data in this article through the K-means++ algorithm. DBSCAN was then employed to perform clustering in each group separately. As a result, the computational burden of DBSCAN execution reduced and the clustering execution speed increased significantly. Finally, border clusters were merged if necessary. According to the results of executing the proposed algorithm, it managed to greatly reduce the DBSCAN execution time (98% in the best-case scenario) with no significant changes in the qualitative evaluation criteria for clustering.

show abstract

A novel K-means based clustering algorithm for big data

Cited by 15 publications

References 10 publications

Peer Review #3 of "Big data clustering techniques based on Spark: a literature review (v0.1)"

Peer Review #3 of "Big data clustering techniques based on Spark: a literature review (v0.1)"

Enhanced Evolutionary Computing Assisted K-Means Clustering Algorithm for BigData Analytics

K-DBSCAN: An improved DBSCAN algorithm for big data

Contact Info

Product

Resources

About