Proceedings of the 2017 SIAM International Conference on Data Mining 2017
DOI: 10.1137/1.9781611974973.31
|View full text |Cite
|
Sign up to set email alerts
|

Multi-core K-means

Abstract: Today's microprocessors consist of multiple cores each of which can perform multiple additions, multiplications, or other operations simultaneously in one clock cycle. To maximize performance, two types of parallelism must be applied in a data mining algorithm: MIMD (Multiple Instruction Multiple Data) where different CPU cores execute different code and follow different threads of control, and SIMD (Single Instruction Multiple Data) where within a core, the same operation is executed at once on various data. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 18 publications
(17 citation statements)
references
References 14 publications
(18 reference statements)
0
16
0
Order By: Relevance
“…We must keep track of the winner distance and the corresponding cluster ID for each point. This can be facilitated in a SIMD-parallel way by backpacking [9] the cluster ID in the least significant bits of the distance, noted dist, cID :…”
Section: K-means Clusteringmentioning
confidence: 99%
See 1 more Smart Citation
“…We must keep track of the winner distance and the corresponding cluster ID for each point. This can be facilitated in a SIMD-parallel way by backpacking [9] the cluster ID in the least significant bits of the distance, noted dist, cID :…”
Section: K-means Clusteringmentioning
confidence: 99%
“…Here, we extended our K-means implementation [9] with the Hilbert curve. We use the same comparison methods as for matrix multiplication but exclude the Peano-curve based algorithm by Bader et al [11], [13] since this approach is not designed to support K-means and is outperformed by MKL-BLAS on the task of matrix multiplication.…”
Section: K-meansmentioning
confidence: 99%
“…2.3.1 General Parallel k-means k-means algorithm has been widely implemented in parallel architectures with shared and distributed memory using either SIMD or MIMD model targeting on multi-core processors [5], [14], [20], GPU-based heterogeneous systems [28], [39], [41], clusters of computer/cloud [11], [22]. In the parallel case, we use l to index the processors (computing units) P (P = {P l }, l ∈ {1 .…”
Section: Related Workmentioning
confidence: 99%
“…Processors should communicate with each other before the final c d j can be updated. [5] Multi-core MIMD/SIMD 10 7 40 20 Hadian and Shahrivari [20] Multi-core multi-thread 10 9 100 68 Zechner and Granitzer [41] GPU CUDA 10 6 128 200 Li, et al [28] GPU CUDA 10 7 512 160 Haut, et al [22] Cloud OpenStack 10 8 8 58 Cui, et al [11] Cluster Hadoop 10 5 100 9 Supercomputer-Oriented k-means Implementations Kumar, et al [27] Jaguar, Oak Ridge MPI 10 10 1000 30 Cai, et al [7] Gordon, SDSC mclappy (parallel R) 10 6 8 8 Bender, et al [3] T [27] implemented the dataflow-partition based parallel k-means on the Jaguar, a Cray XT5 supercomputer at Oak Ridge National Laboratory evaluated by real-world geographical datasets. Their implementation applys MPI protocols to achieve broadcasting and reducing and originally scaled the value of k to more than 1,000s level.…”
Section: Related Workmentioning
confidence: 99%
“…By construction, this graph can be well mapped to 2D space-we even have a ground truth for the embedding in form of the originally sampled (x, y)coordinates. In the bottom row of Figure 1 we show the ground truth coordinates (open circles) together with the embedding result (red points, connected to its ground truth point by a line), after globally rotating and aligning them to the ground truth by a technique called Procrustes analysis 1 .…”
Section: Introductionmentioning
confidence: 99%