Multiple Flows of Control in Migratable Parallel Programs

Zheng, Gengbin; Kalé, Laxmikant V.; Lawlor, Orion Sky

doi:10.1109/icppw.2006.58

Cited by 7 publications

(2 citation statements)

References 21 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Adaptive MPI (AMPI) is an implementation of the MPI standard on top of Charm++ (Huang et al 2003(Huang et al , 2006Zheng et al 2006). As abovementioned, the developer only sees the virtual processors while the mapping of virtual processors to physical processors is handled by the Charm++ runtime system.…”

Section: Adaptive Mpimentioning

confidence: 99%

A New Approach to Load Balance for Parallel Compositional Simulation Based on Reservoir Model Over-decomposition

Wang

Killough

2013

All Days

View full text Add to dashboard Cite

The quest for efficient and scalable parallel reservoir simulators has been evolving with the advancement of high performance computing architectures. Among the various challenges of efficiency and scalability, load imbalance is a major obstacle that has not been fully addressed and solved. The reasons that cause load imbalance in parallel reservoir simulation are both static and dynamic. Robust graph partitioning algorithms are capable of handling static load imbalance by decomposing the underlying reservoir geometry to distribute a roughly equal load to each processor. However, these loads determined by a static load balancer seldom remain unchanged as the simulation proceeds in time. This so called dynamic imbalance can be further exacerbated in parallel compositional simulations. The flash calculations for equations of state in complex compositional simulations not only can consume over half of the total execution time but also are difficult to balance merely by a static load balancer. The computational cost of flash calculations in each grid block heavily depends on the dynamic data such as pressure, temperature, and hydrocarbon composition. Thus, any static assignment of grid blocks may lead to dynamic load imbalance in unpredictable manners. A dynamic load balancer can often provide solutions for this difficulty. However, traditional techniques are inflexible and tedious to implement in legacy reservoir simulators. In this paper, we present a new approach to address dynamic load imbalance in parallel compositional simulation. It overdecomposes the reservoir model to assign each processor a bundle of subdomains. Processors treat these bundles of subdomains as virtual processes or user-level migratable threads which can be dynamically migrated across processors in the run-time system. This technique is shown to be capable of achieving better overlap between computation and communication for cache efficiency. We employ this approach in a legacy reservoir simulator and demonstrate reduction in the execution time of parallel compositional simulations while requiring minimal changes to the source code. Finally, it is shown that domain over-decomposition together with a load balancer can improve speedup from 29.27 to 62.38 on 64 physical processors for a realistic simulation problem.

show abstract

Section: Adaptive Mpimentioning

confidence: 99%

A New Approach to Load Balance for Parallel Compositional Simulation Based on Reservoir Model Over-decomposition

Wang

Killough

2013

All Days

View full text Add to dashboard Cite

show abstract

“…AMPI is an implementation of the MPI standard on top of Charmþþ (Huang et al 2003(Huang et al , 2006Zheng et al 2006). As previously mentioned, the developer sees only the virtual processors whereas the mapping of virtual processors to physical processors is handled by the Charmþþ run-time system.…”

Section: Introductionmentioning

confidence: 99%

A New Approach to Load Balance for Parallel/Compositional Simulation Based on Reservoir-Model Overdecomposition

Wang

Killough

2013

SPE Journal

View full text Add to dashboard Cite

The quest for efficient and scalable parallel reservoir simulators has been evolving with the advancement of high-performance computing architectures. Among the various challenges of efficiency and scalability, load imbalance is a major obstacle that has not been fully addressed and solved. The causes of load imbalance in parallel reservoir simulation are both static and dynamic. Robust graph-partitioning algorithms are capable of handling static load imbalance by decomposing the underlying reservoir geometry to distribute a roughly equal load to each processor. However, these loads that are determined by a static load balancer seldom remain unchanged as the simulation proceeds in time. This socalled dynamic imbalance can be exacerbated further in parallel compositional simulations. The flash calculations for equations of state (EOSs) in complex compositional simulations not only can consume more than half of the total execution time but also are difficult to balance merely by a static load balancer. The computational cost of flash calculations in each gridblock heavily depends on the dynamic data such as pressure, temperature, and hydrocarbon composition. Thus, any static assignment of gridblocks may lead to dynamic load imbalance in unpredictable manners. A dynamic load balancer can often provide solutions for this difficulty. However, traditional techniques are inflexible and tedious to implement in legacy reservoir simulators. In this paper, we present a new approach to address dynamic load imbalance in parallel compositional simulation. It overdecomposes the reservoir model to assign each processor a bundle of subdomains. Processors treat these bundles of subdomains as virtual processes or userlevel migratable threads that can be dynamically migrated across processors in the run-time system. This technique is shown to be capable of achieving better overlap between computation and communication for cache efficiency. We use this approach in a legacy reservoir simulator and demonstrate a reduction in the execution time of parallel compositional simulations while requiring minimal changes to the source code. Finally, it is shown that domain overdecomposition, together with a load balancer, can improve speedup from 29.27 to 62.38 on 64 physical processors for a realistic simulation problem.

show abstract

Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects

Kalé

Zheng

2009

Advanced Computational Infrastructures for Parallel and Distributed Adaptive Applications

View full text Add to dashboard Cite

Multiple Flows of Control in Migratable Parallel Programs

Cited by 7 publications

References 21 publications

A New Approach to Load Balance for Parallel Compositional Simulation Based on Reservoir Model Over-decomposition

A New Approach to Load Balance for Parallel Compositional Simulation Based on Reservoir Model Over-decomposition

A New Approach to Load Balance for Parallel/Compositional Simulation Based on Reservoir-Model Overdecomposition

Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects

Contact Info

Product

Resources

About