Penporn Koanantakool scite author profile

Penporn Koanantakool

4Publications

83Citation Statements Received

42Citation Statements Given

How they've been cited

121

How they cite others

143

Affiliations

Google (United States), University of California, Berkeley, Lawrence Berkeley National Laboratory

Publications

Order By: Most citations

A Communication-Optimal N-Body Algorithm for Direct Interactions

Driscoll

Georganas

Koanantakool

et al. 2013

View full text Add to dashboard Cite

We consider the problem of communication avoidance in computing interactions between a set of particles in scenarios with and without cutoff radius for interaction. Our strategy, which we show to be optimal in communication, divides the work in the iteration space rather than simply dividing the particles over processors, so more than one processor may be responsible for computing updates to a single particle. Similar to a force decomposition in molecular dynamics, this approach requires √ p times more memory than a particle decomposition, but reduces communication costs by a factor of √ p and is often faster in practice than a particle decomposition [1]. We examine a generalized force decomposition algorithm that tolerates the memory limited case, i.e. when memory can only hold c copies of the particles for c = 1, 2, ..., √ p. When c = 1, the algorithm degenerates into a particle decomposition; similarly when c = √ p, the algorithm uses a force decomposition. We present a proof that the algorithm is communication-optimal and reduces critical path latency and bandwidth costs by factors of c 2 and c, respectively. Performance results from experiments on up to 24K cores of Cray XE-6 and IBM BlueGene/P machines indicate that the algorithm reduces communication in practice. In some cases, it even outperforms the original force decomposition approach because the right choice of c strikes a balance between collective and point-to-point communication cost. Finally, we extend the analysis to include a cutoff radius for direct evaluation of force interactions. We show that with a cutoff, communication optimality still holds. We describe a generalized algorithm for multi-dimensional space and assess its performance for 1D and 2D simulations on the same systems.

show abstract

Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication

Koanantakool

Azad

Buluç

et al. 2016

View full text Add to dashboard Cite

Multiplication of a sparse matrix with a dense matrix is a building block of an increasing number of applications in many areas such as machine learning and graph algorithms. However, most previous work on parallel matrix multiplication considered only both dense or both sparse matrix operands. This paper analyzes the communication lower bounds and compares the communication costs of various classic parallel algorithms in the context of sparse-dense matrix-matrix multiplication. We also present new communication-avoiding algorithms based on a 1D decomposition, called 1.5D, which -while suboptimal in dense-dense and sparse-sparse cases -outperform the 2D and 3D variants both theoretically and in practice for sparsedense multiplication. Our analysis separates one-time costs from per iteration costs in an iterative machine learning context. Experiments demonstrate speedups up to 100x over a baseline 3D SUMMA implementation and show parallel scaling over 10 thousand cores.

show abstract

Write-Avoiding Algorithms

Carson

Demmel

Grigori

et al. 2016

View full text Add to dashboard Cite

Compiler Support for Sparse Tensor Computations in MLIR

Bik

Koanantakool

Shpeisman

et al. 2022

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Sparse tensors arise in problems in science, engineering, machine learning, and data analytics. Programs that operate on such tensors can exploit sparsity to reduce storage requirements and computational time. Developing and maintaining sparse software by hand, however, is a complex and error-prone task. Therefore, we propose treating sparsity as a property of tensors, not a tedious implementation task, and letting a sparse compiler generate sparse code automatically from a sparsity-agnostic definition of the computation. This paper discusses integrating this idea into MLIR.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.