Amani AlOnazi scite author profile

Pregel [23] was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on graph partitioning as a preprocessing step to balance computation across compute nodes. In this paper, we examine the runtime characteristics of a Pregel system. We show that graph partitioning alone is insufficient for minimizing end-to-end computation. Especially where data is very large or the runtime behavior of the algorithm is unknown, an adaptive approach is needed. To this end, we introduce Mizan, a Pregel system that achieves efficient load balancing to better adapt to changes in computing needs. Unlike known implementations of Pregel, Mizan does not assume any a priori knowledge of the structure of the graph or behavior of the algorithm. Instead, it monitors the runtime characteristics of the system. Mizan then performs efficient fine-grained vertex migration to balance computation and communication. We have fully implemented Mizan; using extensive evaluation we show that-especially for highly-dynamic workloads-Mizan provides up to 84% improvement over techniques leveraging static graph pre-partitioning.

show abstract

Asynchronous Task-Based Parallelization of Algebraic Multigrid

AlOnazi

Markomanolis

Keyes

2017

View full text Add to dashboard Cite

Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry

AlOnazi

Ltaief

Keyes

et al. 2019

View full text Add to dashboard Cite

We propose a new framework for deploying Reverse Time Migration (RTM) simulations on distributed-memory systems equipped with multiple GPUs. Our software, TB-RTM, infrastructure engine relies on the STARPU dynamic runtime system to orchestrate the asynchronous scheduling of RTM computational tasks on the underlying resources. Besides dealing with the challenging hardware heterogeneity, TB-RTM supports tasks with different workload characteristics, which stress disparate components of the hardware system. RTM is challenging in that it operates intensively at both ends of the memory hierarchy, with compute kernels running at the highest level of the memory system, possibly in GPU main memory, while I/O kernels are saving solution data to fast storage. We consider how to span the wide performance gap between the two extreme ends of the memory system, i.e., GPU memory and fast storage, on which large-scale RTM simulations routinely execute. To maximize hardware occupancy while maintaining high memory bandwidth throughout the memory subsystem, our framework presents the new out-of-core (OOC) feature from STARPU to prefetch data solutions in and out not only from/to the GPU/CPU main memory but also from/to the fast storage system. The OOC technique may trigger opportunities for overlapping expensive data movement with computations. TB-RTM framework addresses this challenging problem of heterogeneity with a systematic approach that is oblivious to the targeted hardware architectures. Our resulting RTM framework can effectively be deployed on massively parallel GPU-based systems, while delivering performance scalability up to 500 GPUs.

show abstract

Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms

AlOnazi

Keyes²,

Lastovetsky³

et al. 2014

View full text Add to dashboard Cite

Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures

AlOnazi

Rogowski

Al-Zawawi

et al. 2018

View full text Add to dashboard Cite

Two trends are reshaping the landscape of petroleum reservoir simulators, one architecturally and one application driven: an increasing number of cores per node and increasing computational intensity arising from higher fidelity physics at each cell. Implicit algebraic solvers being the dominant kernels, we present hybrid MPI and OpenMP implementations of the linear solver of GigaPOWERS, a full-scale real-world massively parallel simulator for black oil and composition models. We also evaluate the impact of explicit communication and computation overlap by including the halo exchange in the task-dependency graph. We analyze the performance of these modifications across multi-and many-core architectures, i.e., Intel Haswell, Skylake, and Knights Landing, using a variety of synthetic and real-world models. The hybrid approach results in up to 50% reduction of time to solution on a 16 million-cell SPE10-like model on Skylake whereas on a smaller, 1 million-cell, model on Haswell and Knights Landing both implementations achieve very similar performance. In the real-world reservoir simulations, the hybrid parallelism has reduced communication volume, memory consumption, and improved load balancing.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Amani AlOnazi

Mizan

Asynchronous Task-Based Parallelization of Algebraic Multigrid

Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry

Design and Optimization of OpenFOAM-based CFD Applications for Modern Hybrid and Heterogeneous HPC Platforms

Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures

Contact Info

Product

Resources

About