Hormozd Gahvari scite author profile

Abstract-The rise of multicore cluster architectures has led to intense interest in using a combination of MPI and OpenMP to more effectively program these machines. We present a performance model for hybrid implementation of the solve cycle of algebraic multigrid (AMG), a popular iterative solver for large sparse linear systems and a key component of many scientific simulations. We validate the model on two leading parallel platforms, and discuss implications for applications programmed in a hybrid model on future machines.

show abstract

Preparing Algebraic Multigrid for Exascale

Baker¹,

Falgout²,

Gahvari³

et al. 2012

View full text Add to dashboard Cite

Abstract. Algebraic Multigrid (AMG) solvers are an essential component of many large-scale scientific simulation codes. Their continued numerical scalability and efficient implementation is critical for preparing these codes for exascale. Our experiences on modern multi-core machines show that significant challenges must be addressed for AMG to perform well on such machines. We discuss our experiences and describe the techniques we have used to overcome scalability challenges for AMG on hybrid architectures in preparation for exascale.1. Introduction. Sparse iterative linear solvers are critical for large-scale scientific simulations, many of which spend the majority of their run time in solvers. Algebraic Multigrid (AMG) is a popular solver because of its linear run-time complexity and its proven scalability in distributed-memory environments. However, changing supercomputer architectures present challenges to AMG's continued scalability.Multi-core processors are now standard on commodity clusters and high-end supercomputers alike, and core counts are increasing rapidly. However, distributed-memory message passing implementations, such as MPI, are not expected to work efficiently with more than hundreds of thousands of tasks. With exascale machines expected to have hundreds of millions or billions of tasks and hundreds of tasks per node, programming models will necessarily be hierarchical, with local shared-memory nodes in a larger distributed-memory message-passing environment.With exascale in mind, we have begun to focus on the performance of BoomerAMG [14], the AMG solver in the hypre [15] library, on multicore architectures. BoomerAMG has demonstrated good weak scalability in distributed-memory environments, such as on 125,000 processors of BG/L [8], or BG/P [5], but our preliminary study [7] has shown that non-uniform memory access (NUMA) latency between sockets, deep cache hierarchies, multiple memory controllers, and reduced on-node bandwidth can be detrimental to AMG's performance.To achieve high performance on exascale machines, we will need to ensure numerical scalability and an efficient implementation as core counts increase, memory capacity per core decreases, and on-node cache architectures become more complex. Some components of AMG that lead to very good convergence do not parallelize well or depend on the number of processors. We examine the effect of high level parallelism involving large numbers of cores on one of AMG's most important components, smoothers, in Section 3. We also develop a performance model of the AMG solve cycle to better understand AMG's performance bottlenecks (Section 4), and use it to evaluate new AMG variants (Section 5). Since our investigations show that the increasing communication complexity on coarser grids combined with the effects of increasing numbers of cores lead to severe performance bottlenecks for AMG on various multicore architectures, we investigate two different approaches to reduce communication in AMG: an AMG variant, which we denote as the "redundant c...

show abstract

Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes

Bhatelé

Jetley

Gahvari

et al. 2011

View full text Add to dashboard Cite

A Performance Model for Allocating the Parallelism in a Multigrid-in-Time Solver

Gahvari

Dobrev

Falgout

et al. 2016

View full text Add to dashboard Cite

Algebraic Multigrid on a Dragonfly Network: First Experiences on a Cray XC30

Gahvari

Gropp

Jordan

et al. 2015

View full text Add to dashboard Cite

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers

Gahvari

Gropp

Jordan

et al. 2013

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hormozd Gahvari

Modeling the performance of an algebraic multigrid cycle on HPC platforms

An introductory exascale feasibility study for FFTs and multigrid

Modeling the Performance of an Algebraic Multigrid Cycle Using Hybrid MPI/OpenMP

Preparing Algebraic Multigrid for Exascale

Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes

A Performance Model for Allocating the Parallelism in a Multigrid-in-Time Solver

Algebraic Multigrid on a Dragonfly Network: First Experiences on a Cray XC30

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers

Contact Info

Product

Resources

About