Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations

Prat, Raphaël; Colombet, Laurent; Namyst, Raymond

doi:10.1145/3225058.3225085

Cited by 12 publications

(9 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SFC-based domain decompositions yield an efficient partitioning scheme for stencil-like algorithms [200,201]. Adaptive variants, however, have not consistently proven to be beneficial [202][203][204], because of the more involved neighbor search. Additionally, in a short-range MD simulation, the main portion of the computational load is generated by the number of force pairs and only loosely coupled to the number of cells.…”

Section: Molecular Dynamicsmentioning

confidence: 99%

Adaptive grid implementation for parallel continuum mechanics methods in particle simulations

Mehl

Lahnert

2019

Eur. Phys. J. Spec. Top.

View full text Add to dashboard Cite

show abstract

Section: Molecular Dynamicsmentioning

confidence: 99%

Adaptive grid implementation for parallel continuum mechanics methods in particle simulations

Mehl

Lahnert

2019

Eur. Phys. J. Spec. Top.

View full text Add to dashboard Cite

show abstract

“…Regarding affinity, it can be seen that it is advisable to select one of the available strategies instead of delegating the distribution to the operating system (none). Unlike scatter, balanced and compact guarantee the proximity among OpenMP threads with consecutive identifiers, minimizing in this way the data communication that each thread requires 6 . As it was mentioned in Section 4.3, the compiler detects false dependencies in that loop and it is not able to generate SIMD binary code by itself.…”

Section: Performance Results On the Intel Xeon Phi 7230mentioning

confidence: 99%

“…Nowadays, the scientific community is experimenting with a new revolution on parallel processor technologies in the road to the Exascale. The novelties and enhancements not only involve hardware technologies but also changes in parallel programming models [6]. Beyond that, one of the most important challenges that still remains is how to perform large-scale simulations in a reasonable time using affordable computer systems.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimization of the N-Body Simulation on Intel’s Architectures Based on AVX-512 Instruction Set

Rucci

Moreno

Pousa

et al. 2020

Computer Science – CACIC 2019

View full text Add to dashboard Cite

The N-body simulations have become a powerful tool to test the gravitational interaction among particles, ranging from a few bodies to complete galaxies. Even though N-body has already been optimized on many parallel platforms, there are hardly any studies which take advantage of the latest Intel architectures based on AVX-512 instruction set. This SIMD set was initially supported by Intel's Xeon Phi Knights Landing (KNL) manycore processors launched at 2016. Recently, it has been included in Intel's general-purpose processors too, starting at the Skylake (SKL) server microarchitecture and now in its successor Cascade Lake (CKL). This paper optimizes the all-pairs N-body simulation on both current Intel platforms supporting AVX-512 extensions: a Xeon Phi KNL node and a server equipped with a dual CKL processor. On the basis of a naive implementation, it is shown how the parallel implementation (can) reach, through different optimization techniques, 2355 and 2449 GFLOPS on the Xeon Phi KNL and the Xeon CKL platforms, respectively.

show abstract

“…A communication thread per rank is used to coordinate the work-stealing, while OpenMP tasks conduct computations. Prat et al [22] studied the taskification of computations in AMR applications using OpenMP tasks and dependencies, combined with cache blocking and vectorization techniques. However, they did not include the study of communication patterns.…”

Section: Related Workmentioning

confidence: 99%

Towards Data-Flow Parallelization for Adaptive Mesh Refinement Applications

Sala

Rico

Beltrán

2020

2020 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

Adaptive Mesh Refinement (AMR) is a prevalent method used by distributed-memory simulation applications to adapt the accuracy of their solutions depending on the turbulent conditions in each of their domain regions. These applications are usually dynamic since their domain areas are refined or coarsened in various refinement stages during their execution. Thus, they periodically redistribute their workloads among processes to avoid load imbalance. Although the defacto standard for scientific computing in distributed environments is MPI, in recent years, pure MPI applications are being ported to hybrid ones, attempting to cope with modern multi-core systems. Recently, the Task-Aware MPI library was proposed to efficiently integrate MPI communications and tasking models, providing also the transparent management of communications issued by tasks. In this paper, we demonstrate the benefits of porting AMR applications to data-flow programming models leveraging that novel hybrid approach. We exploit most of the application parallelism by taskifying all stages, allowing their natural overlap. We employ these techniques on the miniAMR proxy application, which mimics the refinement, load balancing, communication, and computation patterns of general AMR applications. We evaluate how this approach reduces the time in its computation and communication phases while achieving better programmability than other conventional hybrid techniques.

show abstract

Combining Task-based Parallelism and Adaptive Mesh Refinement Techniques in Molecular Dynamics Simulations

Cited by 12 publications

References 29 publications

Adaptive grid implementation for parallel continuum mechanics methods in particle simulations

Adaptive grid implementation for parallel continuum mechanics methods in particle simulations

Optimization of the N-Body Simulation on Intel’s Architectures Based on AVX-512 Instruction Set

Towards Data-Flow Parallelization for Adaptive Mesh Refinement Applications

Contact Info

Product

Resources

About