Mohammed Al Farhan scite author profile

Algorithmic and architecture-oriented optimizations are essential for achieving performance worthy of anticipated energy-austere exascale systems. In this paper, we present an extreme scale FMM-accelerated boundary integral equation solver for wave scattering, which uses FMM as a matrix-vector multiplication inside the GMRES iterative method. Our FMM Helmholtz kernels are capable of treating nontrivial singular and near-field integration points. We implement highly optimized kernels for both shared and distributed memory, targeting emerging Intel extreme performance HPC architectures. We extract the potential thread-and data-level parallelism of the key Helmholtz kernels of FMM. Our application code is well optimized to exploit the AVX-512 SIMD units of Intel Skylake and Knights Landing architectures. We provide different performance models for tuning the task-based tree traversal implementation of FMM, and develop optimal architecturespecific and algorithm aware partitioning, load balancing, and communication reducing mechanisms to scale up to 6,144 compute nodes of a Cray XC40 with 196,608 hardware cores. With shared memory optimizations, we achieve roughly 77% of peak single precision floating point performance of a 56-core Skylake processor, and on average 60% of peak single precision floating point performance of a 72-core KNL. These numbers represent nearly 5.4x and 10x speedup on Skylake and KNL, respectively, compared to the the baseline scalar code. With distributed memory optimizations, on the other hand, we report near-optimal efficiency in the weak scalability study with respect to both the O(log P ) communication complexity as well as the theoretical scaling complexity of FMM. In addition, we exhibit up to 85% efficiency in strong scaling. We compute in excess of 2 billion DoF on the full-scale of the Cray XC40 supercomputer. The numerical results match the analytical solution with convergence at 1.0e-4 relative 2-norm residual accuracy. To the best of our knowledge, this work presents the fastest and the most scalable FMM-accelerated linear solver for oscillatory kernels.

show abstract

Unstructured computational aerodynamics on many integrated core architecture

Farhan

Kaushik

Keyes

2016

Parallel Computing

View full text Add to dashboard Cite

Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

Abduljabbar

Farhan

Yokota

et al. 2017

View full text Add to dashboard Cite

MAGMA templates for scalable linear algebra on emerging architectures

Farhan

Abdelfattah

Tomov

et al. 2020

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

With the acquisition and widespread use of more resources that rely on accelerator/wide vector–based computing, there has been a strong demand for science and engineering applications to take advantage of these latest assets. This, however, has been extremely challenging due to the diversity of systems to support their extreme concurrency, complex memory hierarchies, costly data movement, and heterogeneous node architectures. To address these challenges, we design a programming model and describe its ease of use in the development of a new MAGMA Templates library that delivers high-performance scalable linear algebra portable on current and emerging architectures. MAGMA Templates derives its performance and portability by (1) building on existing state-of-the-art linear algebra libraries, like MAGMA, SLATE, Trilinos, and vendor-optimized math libraries, and (2) providing access (seamlessly to the users) to the latest algorithms and architecture-specific optimizations through a single, easy-to-use C++-based API.

show abstract

An algorithm for reduct cardinality minimization

AbouEisha

Farhan

Chikalov

et al. 2013

View full text Add to dashboard Cite

Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering

Abduljabbar¹,

Farhan²,

Al-Harthi³

et al. 2018

Preprint

View full text Add to dashboard Cite

Unstructured Computations on Emerging Architectures

Farhan¹

2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.