Accelerating K-Means clustering with parallel implementations and GPU computing

Bhimani, Janki; Leeser, Miriam; Mi, Ningfang

doi:10.1109/hpec.2015.7322467

Cited by 56 publications

(35 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Also exploring K-means on GPUs [16,17] has been done. But latest FPGA implementations of K-means date back more than fifteen years [18].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

K-means parallelism on FPGA

Huang¹

View full text Add to dashboard Cite

“…Also exploring K-means on GPUs [16,17] has been done. But latest FPGA implementations of K-means date back more than fifteen years [18].…”

Section: Related Workmentioning

confidence: 99%

“…The number of iterations used are 10, 50. The results of software sequential and parallel methods are referred from Janki's work [16].…”

Section: Awsmentioning

confidence: 99%

K-means parallelism on FPGA

Huang¹

View full text Add to dashboard Cite

“…We report results against a heterogeneous node based approach running a custom implementation of parallel k-means on ten heterogeneous nodes, each node consisting of an NVIDIA Tesla K20M GPU with two Intel Xeon E5-2620 CPUs [35]. Further, we compare against two GPU based implementations running on an NVIDIA Tesla K20M GPU and an NVIDIA Tesla K20C GPU respectively [4], [26], an FPGA based approach running a custom parallel k-means implementation on Xilinx ZC706 FPGA [29], and a multi-core processor based approach running a custom implementation of parallel k-means on 8-core Intel i7-3770k processor [15]. The proposed approach running on the Sunway Taihu-Light supercomputer achieves more than 100x speedup over the high-performance heterogeneous nodes based approach, between 50x-70x speedup than those single GPU based approaches, and 31x speedup over multi-core CPU based approach on their largest solvable workload sizes.…”

Section: Comparison With Other Architecturesmentioning

confidence: 99%

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Zhao

Liu

et al. 2020

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

This paper presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.

show abstract

“…Recently, sophisticated projects have emerged in the study of Spark applications performance, such as PREDIcT [21] and RISE-2016 [12]. PREDIct is a tool including a set of prediction techniques for different areas of data analytics, while RISE2016 is a collection of scalable performance prediction techniques for big data processing in distributed multi-core systems.…”

Section: Related Workmentioning

confidence: 99%

“…A Node data structure is initialized at line 9; • In a similar manner, the following phase (lines [12][13][14][15][16] is invoked (line 51), in order to remove the lock taken earlier on the node if the the following conditions are met: i) no more tasks need to be executed, ii) no other user has locked the node, and iii) there are no other stages to start. If all the conditions are met, the lock put by the current user on the node can be released.…”

Section: Task Precedence Modelmentioning

confidence: 99%

Performance Prediction of Cloud-Based Big Data Applications

Ardagna

Barbierato

Evangelinou

et al. 2018

Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering

View full text Add to dashboard Cite

Big data analytics have become widespread as a means to extract knowledge from large datasets. Yet, the heterogeneity and irregularity usually associated with big data applications often overwhelm the existing software and hardware infrastructures. In such context, the flexibility and elasticity provided by the cloud computing paradigm offer a natural approach to cost-effectively adapting the allocated resources to the application's current needs. However, these same characteristics impose extra challenges to predicting the performance of cloud-based big data applications, a key step to proper management and planning. This paper explores three modeling approaches for performance prediction of cloud-based big data applications. We evaluate two queuing-based analytical models and a novel fast ad hoc simulator in various scenarios based on different applications and infrastructure setups. The three approaches are compared in terms of prediction accuracy, finding that our best approaches can predict average application execution times with 26% relative error in the very worst case and about 7% on average.

show abstract

Accelerating K-Means clustering with parallel implementations and GPU computing

Cited by 56 publications

References 9 publications

K-means parallelism on FPGA

K-means parallelism on FPGA

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Performance Prediction of Cloud-Based Big Data Applications

Contact Info

Product

Resources

About