SC18: International Conference for High Performance Computing, Networking, Storage and Analysis 2018
DOI: 10.1109/sc.2018.00016
|View full text |Cite
|
Sign up to set email alerts
|

Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers

Abstract: This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer.Our design is able to process large-scale clustering problems with up … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 29 publications
0
7
0
Order By: Relevance
“…Our K-Means implementation partitions the problem based on data size, since its application to climate data is a weak scaling problem. We process much larger datasizes, though [51] showcases good performance for much higher dimensionality, up to O(10E6), and clusters O(10E6) than our use case. For comparison, in our capstone problem, in the K-Means stage of DisCo workflow we process ∼70E9 lightcones (∼70E6/node) of 84 dimensions into 8 clusters in 2.32 s/iteration on Intel E5-2698 v3 (vs. 2.5E6 samples of 68 dimensions into 10,000 clusters in 2.42 s/iteration on 16 nodes of Intel i7-3770K processors in [51]).…”
Section: Related Workmentioning
confidence: 97%
See 1 more Smart Citation
“…Our K-Means implementation partitions the problem based on data size, since its application to climate data is a weak scaling problem. We process much larger datasizes, though [51] showcases good performance for much higher dimensionality, up to O(10E6), and clusters O(10E6) than our use case. For comparison, in our capstone problem, in the K-Means stage of DisCo workflow we process ∼70E9 lightcones (∼70E6/node) of 84 dimensions into 8 clusters in 2.32 s/iteration on Intel E5-2698 v3 (vs. 2.5E6 samples of 68 dimensions into 10,000 clusters in 2.42 s/iteration on 16 nodes of Intel i7-3770K processors in [51]).…”
Section: Related Workmentioning
confidence: 97%
“…[49] is an extension of this work for larger datasets of billions of points, and [50] optimizes K-Means performance on Intel KNC processors by efficient vectorization. The authors of [51] propose a hierarchical scheme for partitioning data based on data flow, centroids(clusters), and dimensions. Our K-Means implementation partitions the problem based on data size, since its application to climate data is a weak scaling problem.…”
Section: Related Workmentioning
confidence: 99%
“…It was tested on an eight-core system and achieved a significant speedup over the naive parallel implementation of Lloyd's method. Li et al [41] and Li et al [42] proposed two implementations of Lloyd's algorithm for the SW26010 processor used in the Sunway TaihuLight supercomputer. (At the time of this writing, it was fourth on the list of the Top 500 supercomputers [43]).…”
Section: Related Researchmentioning
confidence: 99%
“…The algorithm was tested on a system with two quadcore Intel CPUs giving a significant speedup over a naive implementation. [35] and [36] describe two implementations of Lloyd's algorithm for the SW26010 many-core processor used in Sunway TaihuLight supercomputer (at the time of writing this paper it was third on the Top500 supercomputer list [37]). While the previous work focuses on fine-tuned kernel running on a single processor, the latter discusses the implementation on thousands of nodes of TaihuLight.…”
Section: Related Workmentioning
confidence: 99%