2019
DOI: 10.1007/s11390-019-1900-5
|View full text |Cite
|
Sign up to set email alerts
|

Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 41 publications
0
5
0
Order By: Relevance
“…It was tested on an eight-core system and achieved a significant speedup over the naive parallel implementation of Lloyd's method. Li et al [41] and Li et al [42] proposed two implementations of Lloyd's algorithm for the SW26010 processor used in the Sunway TaihuLight supercomputer. (At the time of this writing, it was fourth on the list of the Top 500 supercomputers [43]).…”
Section: Related Researchmentioning
confidence: 99%
“…It was tested on an eight-core system and achieved a significant speedup over the naive parallel implementation of Lloyd's method. Li et al [41] and Li et al [42] proposed two implementations of Lloyd's algorithm for the SW26010 processor used in the Sunway TaihuLight supercomputer. (At the time of this writing, it was fourth on the list of the Top 500 supercomputers [43]).…”
Section: Related Researchmentioning
confidence: 99%
“…The algorithm was tested on a system with two quadcore Intel CPUs giving a significant speedup over a naive implementation. [35] and [36] describe two implementations of Lloyd's algorithm for the SW26010 many-core processor used in Sunway TaihuLight supercomputer (at the time of writing this paper it was third on the Top500 supercomputer list [37]). While the previous work focuses on fine-tuned kernel running on a single processor, the latter discusses the implementation on thousands of nodes of TaihuLight.…”
Section: Related Workmentioning
confidence: 99%
“…Li M et al implemented k-means on sw26010 manycore processor, and sustains a double-precision performance of over 348.1 Gflops. The result is 84% of the theoretical performance upper bound on a single core group [38]. Wang yichao et al employed OpenACC* to port GTC-P on the ''Sunway Taihulight'' supercomputer and achieve 2.5x speedup [39].…”
Section: Related Workmentioning
confidence: 99%