Multi-core K-means

Böhm, Christian; Perdacher, Martin; Plant, Claudia

doi:10.1137/1.9781611974973.31

Cited by 18 publications

(17 citation statements)

References 14 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We must keep track of the winner distance and the corresponding cluster ID for each point. This can be facilitated in a SIMD-parallel way by backpacking [9] the cluster ID in the least significant bits of the distance, noted dist, cID :…”

Section: K-means Clusteringmentioning

confidence: 99%

See 1 more Smart Citation

A Novel Hilbert Curve for Cache-Locality Preserving Loops

Böhm

Perdacher

Plant

2021

IEEE Trans. Big Data

Self Cite

View full text Add to dashboard Cite

Modern microprocessors offer a rich memory hierarchy including various levels of cache and registers. Some of these memories (like main memory, L3 cache) are big but slow and shared among all cores. Others (registers, L1 cache) are fast and exclusively assigned to a single core but small. Only if the data accesses have a high locality, we can avoid excessive data transfers between the memory hierarchy. In this paper we consider fundamental algorithms like matrix multiplication, K-Means, Cholesky decomposition as well as the algorithm by Floyd and Warshall typically operating in two or three nested loops. We propose to traverse these loops whenever possible not in the canonical order but in an order defined by a space-filling curve. This traversal order dramatically improves data locality over a wide granularity allowing not only to efficiently support a cache of a single, known size (cache conscious) but also a hierarchy of various caches where the effective size available to our algorithms may even be unknown (cache oblivious). We propose a new space-filling curve called Fast Unrestricted (FUR) Hilbert with the following advantages: (1) we overcome the usual limitation to square-like grid sizes where the side-length is a power of 2 or 3. Instead, our approach allows arbitrary loop boundaries for all variables. (2) FUR-Hilbert is non-recursive with a guaranteed constant worst case time complexity per loop iteration (in contrast to O(log(gridsize)) for previous methods). (3) Our non-recursive approach makes the application of our cache-oblivious loops in any host algorithm as easy as conventional loops and facilitates automatic optimization by the compiler. (4) We demonstrate that crucial algorithms like Cholesky decomposition as well as the algorithm by Floyd and Warshall by can be efficiently supported. (5) Extensive experiments on runtime efficiency, cache usage and energy consumption demonstrate the profit of our approach. We believe that future compilers could translate nested loops into cache-oblivious loops either fully automatic or by a user-guided analysis of the data dependency.

show abstract

Section: K-means Clusteringmentioning

confidence: 99%

“…Here, we extended our K-means implementation [9] with the Hilbert curve. We use the same comparison methods as for matrix multiplication but exclude the Peano-curve based algorithm by Bader et al [11], [13] since this approach is not designed to support K-means and is outperformed by MKL-BLAS on the task of matrix multiplication.…”

Section: K-meansmentioning

confidence: 99%

A Novel Hilbert Curve for Cache-Locality Preserving Loops

Böhm

Perdacher

Plant

2021

IEEE Trans. Big Data

Self Cite

View full text Add to dashboard Cite

show abstract

“…2.3.1 General Parallel k-means k-means algorithm has been widely implemented in parallel architectures with shared and distributed memory using either SIMD or MIMD model targeting on multi-core processors [5], [14], [20], GPU-based heterogeneous systems [28], [39], [41], clusters of computer/cloud [11], [22]. In the parallel case, we use l to index the processors (computing units) P (P = {P l }, l ∈ {1 .…”

Section: Related Workmentioning

confidence: 99%

“…Processors should communicate with each other before the final c d j can be updated. [5] Multi-core MIMD/SIMD 10 7 40 20 Hadian and Shahrivari [20] Multi-core multi-thread 10 9 100 68 Zechner and Granitzer [41] GPU CUDA 10 6 128 200 Li, et al [28] GPU CUDA 10 7 512 160 Haut, et al [22] Cloud OpenStack 10 8 8 58 Cui, et al [11] Cluster Hadoop 10 5 100 9 Supercomputer-Oriented k-means Implementations Kumar, et al [27] Jaguar, Oak Ridge MPI 10 10 1000 30 Cai, et al [7] Gordon, SDSC mclappy (parallel R) 10 6 8 8 Bender, et al [3] T [27] implemented the dataflow-partition based parallel k-means on the Jaguar, a Cray XT5 supercomputer at Oak Ridge National Laboratory evaluated by real-world geographical datasets. Their implementation applys MPI protocols to achieve broadcasting and reducing and originally scaled the value of k to more than 1,000s level.…”

Section: Related Workmentioning

confidence: 99%

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Zhao

Liu

et al. 2020

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

This paper presents an automatic k-means clustering solution targeting the Sunway TaihuLight supercomputer. We first introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension, which unlocks the potential of the hierarchical parallelism in the heterogeneous many-core processor and the system architecture of the supercomputer. The parallel design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability. Furthermore, we propose an automatic hyper-parameter determination process for k-means clustering, by automatically generating and executing the clustering tasks with a set of candidate hyper-parameter, and then determining the optimal hyper-parameter using a proposed evaluation method. The proposed auto-clustering solution can not only achieve high performance and scalability for problems with massive high-dimensional data, but also support clustering without sufficient prior knowledge for the number of targeted clusters, which can potentially increase the scope of k-means algorithm to new application areas.

show abstract

“…By construction, this graph can be well mapped to 2D space-we even have a ground truth for the embedding in form of the originally sampled (x, y)coordinates. In the bottom row of Figure 1 we show the ground truth coordinates (open circles) together with the embedding result (red points, connected to its ground truth point by a line), after globally rotating and aligning them to the ground truth by a technique called Procrustes analysis 1 .…”

Section: Introductionmentioning

confidence: 99%

Data Compression as a Comprehensive Framework for Graph Drawing and Representation Learning

Plant

Biedermann

Böhm

2020

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

View full text Add to dashboard Cite

Embedding a graph into feature space is a promising approach to understand its structure. Embedding into 2D or 3D space enables visualization; representation in higher-dimensional vector space (typically >100D) enables the application of data mining techniques. For the success of knowledge discovery it is essential that the distances between the embedded vertices truly reflect the structure of the graph. Our fundamental idea is to compress the adjacency matrix by predicting the existence of an edge from the Euclidean distance between the corresponding vertices in the embedding, and to use the achieved compression as a quality measure for the embedding. We call this quality measure Predictive Entropy (PE). PE uses a sigmoid function to define the probability which is monotonically decreasing with the Euclidean distance. We use this sigmoid probability to compress the adjacency matrix of the graph by an entropy coding. While PE could be used to assess the result of any graph drawing or representation learning method we particularly use it as objective function in our new method GEMPE (Graph Embedding by Minimizing the Predictive Entropy). We demonstrate in our experiments that GEMPE clearly outperforms comparison methods with respect to quality of the visual result, clustering and node-labeling accuracy on the discovered coordinates. CCS CONCEPTS • Computing methodologies → Learning latent representations.

show abstract

Multi-core K-means

Cited by 18 publications

References 14 publications

A Novel Hilbert Curve for Cache-Locality Preserving Loops

A Novel Hilbert Curve for Cache-Locality Preserving Loops

Large-Scale Automatic K-Means Clustering for Heterogeneous Many-Core Supercomputer

Data Compression as a Comprehensive Framework for Graph Drawing and Representation Learning

Contact Info

Product

Resources

About