Understanding the wave propagation with respect to the structure of the Earth lies at the heart of many analysis both in the oil and gas industry and for quantitative seismic hazard assessment. One of the most widely used techniques to solve the elastodynamics equation is the finite difference method because of its simplicity and numerical efficiency. In the last two decades, the parallel efficiency of this numerical method has been demonstrated through many applications on various parallel platforms. The complexity induced by multicore platforms both in terms of fine-grained parallelism and considering the memory hierarchy justifies revisiting these conclusions. In this paper, we underline the impact of such platforms on standard implementations.
Finite difference methods are, in general, well suited to execution on parallel machines and are thus commonplace in High Performance Computing. Yet, despite their apparent regularity, they often exhibit load imbalance that damages their efficiency. In this article, we first characterize the spatial and temporal load imbalance of Ondes3D, a seismic wave propagation simulator used to conduct regional scale risk assessment. Our analysis reveals that this imbalance originates from the structure of the input data and from low-level CPU optimizations. Such dynamic imbalance should, therefore, be quite common and can not be solved by any static approach or classical code reorganization. An effective solution for such scenarios, incurring minimal code modification, is to use AMPI/CHARM++. By over-decomposing the application, the CHARM++ runtime can dynamically rebalance the load by migrating data and computation at the granularity of an MPI rank. We show that this approach is effective to balance the spatial/temporal dynamic load of the application, thus drastically reducing its execution time. However, this approach requires a careful selection of the load balancing algorithm, its activation frequency, and of the over-decomposition level. These choices are, unfortunately, quite dependent on application structure and platform characteristics. (i.e., number of processors and their speed; network topology, bandwidth, latency). Therefore, we propose a methodology that leverages the capabilities of the SimGrid simulation framework and allows to conduct such study at low experimental cost. Our approach relies on a combination of emulation, simulation, and application modeling that requires minimal code modification and yet manages to capture both spatial and temporal load imbalance and to faithfully predict the performance of dynamic load balancing. We evaluate the quality of our simulation by systematically comparing simulation results with the outcome of real executions and demonstrate how this approach can be used to quickly find the optimal load balancing configuration for a given application/hardware configuration.
Summary
Finite‐difference methods are commonplace in High Performance Computing applications. Despite their apparent regularity, they often exhibit load imbalance that damages their efficiency. We characterize the spatial and temporal load imbalance of Ondes3D, a typical finite‐differences application dedicated to earthquake modeling. Our analysis reveals imbalance originating from the structure of the input data, and from low‐level CPU optimizations. Ondes3D was successfully ported to AMPI/CHARM++ using over‐decomposition and MPI process migration techniques to dynamically rebalance the load. However, this approach requires careful selection of the over‐decomposition level, the load balancing algorithm, and its activation frequency. These choices are usually tied to application structure and platform characteristics. In this article, we propose a workflow that leverages the capabilities of SimGrid to conduct such study at low experimental cost. We rely on a combination of emulation, simulation, and application modeling that requires minimal code modification and manages to capture both spatial and temporal load imbalance to faithfully predict the performance of dynamic load balancing. We evaluate the quality of our simulation by comparing simulation results with the outcome of real executions and demonstrate how this approach can be used to quickly find the optimal load balancing configuration for a given application/hardware configuration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.