Optimizing High Performance Markov Clustering for Pre-Exascale Architectures

Selvitopi, Oğuz; Hussain, Taufique; Azad, Ariful; Buluç, Aydın

doi:10.1109/ipdps47924.2020.00022

Cited by 14 publications

(10 citation statements)

References 28 publications

(41 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Over the past decade, CombBLAS made significant progress in (a) developing new algorithms for sparse-matrix primitives [7], (b) implementing algorithms to extract highperformance from heterogeneous distributed systems with CPUs and GPUs [33], (c) demonstrating extreme-scalability using communication-avoiding algorithms that scale to the limit of supercomputers [36], [37], and (d) providing customized functionality for several high-impact applications in computational biology [38], [9] and scientific computing [17]. While many of these progresses have already been published separately, we show the overall impact of moving CombBLAS 1.0 to CombBLAS 2.0 and demonstrate how CombBLAS 2.0 made important progress toward exascale.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems

Azad¹,

Selvitopi²,

Hussain³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Combinatorial algorithms such as those that arise in graph analysis, modeling of discrete systems, bioinformatics, and chemistry, are often hard to parallelize. The Combinatorial BLAS library implements key computational primitives for rapid development of combinatorial algorithms in distributed-memory systems. During the decade since its first introduction, the Combinatorial BLAS library has evolved and expanded significantly. This paper details many of the key technical features of Combinatorial BLAS version 2.0, such as communication avoidance, hierarchical parallelism via in-node multithreading, accelerator support via GPU kernels, generalized semiring support, implementations of key data structures and functions, and scalable distributed I/O operations for human-readable files. Our paper also presents several rules of thumb for choosing the right data structures and functions in Combinatorial BLAS 2.0, under various common application scenarios.

show abstract

Section: Resultsmentioning

confidence: 99%

“…Other optimizations include faster memory requirement estimation that involves approximate algorithms and a binary merge scheme that spreads out the computations related to merging of the partial results across stages of the SUMMA algorithm. A recent work [33] contains more information about these optimizations.…”

Section: Gpu Accelerationmentioning

confidence: 99%

Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems

Azad¹,

Selvitopi²,

Hussain³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…A second thrust of ExaBiome involves protein clustering and annotation. ExaBiome's HipMCL [49] and PASTIS [50] code-the latter developed jointly with ExaGraph-provide a scalable protein clustering pipeline, whereas a new prototype deep learning framework [51] shows promising results for functional annotation. HipMCL runs on thousands of nodes and effectively uses GPUs.…”

Section: Exabiomementioning

confidence: 99%

Map Applications to Target Exascale Architecture with Machine-Specific Performance Analysis, Including Challenges and Projections

Siegel

Evans

Draeger

et al. 2021

View full text Add to dashboard Cite

show abstract

“…For GPU-equipped clusters, we developed a model to choose the fastest GPU-based SpGEMM depending on the sparsity of the current MCL iteration and utilized a pipelined communication scheme that hides the cost of CPU-to-GPU data transfers. These advances, coupled with a distributed-memory implementation of randomized output structure prediction algorithm, resulted in orders of magnitude speedup compared to the original HipMCL (Selvitopi et al, 2020b).…”

Section: Algebraic Approaches For Graph Algorithms and Combinatorial Problemsmentioning

confidence: 99%

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications

Acer

Azad

Boman

et al. 2021

The International Journal of High Performance Computing Applica

Self Cite

View full text Add to dashboard Cite

Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the irregular memory access nature of these algorithms makes them one of the hardest algorithmic kernels to implement on parallel systems. With tens of billions of hardware threads and deep memory hierarchies, the exascale computing systems in particular pose extreme challenges in scaling graph algorithms. The codesign center on combinatorial algorithms, ExaGraph, was established to design and develop methods and techniques for efficient implementation of key combinatorial (graph) algorithms chosen from a diverse set of exascale applications. Algebraic and combinatorial methods have a complementary role in the advancement of computational science and engineering, including playing an enabling role on each other. In this paper, we survey the algorithmic and software development activities performed under the auspices of ExaGraph from both a combinatorial and an algebraic perspective. In particular, we detail our recent efforts in porting the algorithms to manycore accelerator (GPU) architectures. We also provide a brief survey of the applications that have benefited from the scalable implementations of different combinatorial algorithms to enable scientific discovery at scale. We believe that several applications will benefit from the algorithmic and software tools developed by the ExaGraph team.

show abstract

Optimizing High Performance Markov Clustering for Pre-Exascale Architectures

Cited by 14 publications

References 28 publications

Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems

Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systems

Map Applications to Target Exascale Architecture with Machine-Specific Performance Analysis, Including Challenges and Projections

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications

Contact Info

Product

Resources

About