Current high-performance computer systems used for scientific computing typically combine shared memory computational nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored approaches. Task based parallel programming has been successful both in simplifying the programming and in exploiting the available hardware parallelism for shared memory systems. In this paper we focus on how to extend task parallel programming to distributed memory systems. We use a hierarchical decomposition of tasks and data in order to accommodate the different levels of hardware. We test the proposed programming model on two different applications, a Cholesky factorization, and a solver for the Shallow Water Equations. We also compare the performance of our implementation with that of other frameworks for distributed task parallel programming, and show that it is competitive. arXiv:1801.03578v1 [cs.DC] 10 Jan 2018 to the computational work performed by one node, the 1×9 process grid has the smallest variance between nodes, and therefore also the lowest maximum work size. The 9 × 1 process grid leads to smaller maximum work size than the 3 × 3 process grid if B is large enough, but suffers from significant load imbalance in the case B = 18. In all cases, the work becomes more evenly distributed if the number of level 1 tasks B is larger. The statistics for communication and computation point in different directions, but when comparing with actual run times, we have found that the communication size is the most informative measure. Having a large total communication size is likely to be detrimental to performance as the risk of tasks left waiting for remote data increases as well as the risk of congestion of messages. A square process grid is the factor that has the largest impact. Regarding the block sizes, having a large B improves the load balance, but increases the amount of communication as well as the number of messages (another indicator that is not shown in the graphics).
By using task-based programming models, application developers who are not necessarily experts in parallel programming get access to the potentially high performance of multi-core based computer systems. We have derived a family of task parallel programming models where data dependencies are represented through data versioning. Benefits of using this type of model are that it is easy to represent different types of dependencies and that scheduling decisions can be made locally. Experiments show that a thread parallel shared memory implementation as well as a hybrid thread/MPI distributed memory implementation scale well on a system with 64 cores. Comparing the hybrid implementation with a pure MPI version, the results are comparable for small numbers of cores, but for larger numbers of cores the gain of using the hybrid model is substantial.
Task based parallel programming has shown competitive outcomes in many aspects of parallel programming such as efficiency, performance, productivity and scalability. Different approaches are used by different software development frameworks to provide these outcomes to the programmer, while making the underlying hardware architecture transparent to her. However, since programs are not portable between these frameworks, using one framework or the other is still a vital decision by the programmer whose concerns are expandability, adaptivity, maintainability and interoperability of the programs. In this work, we propose a unified programming interface that a programmer can use for working with different task based parallel frameworks transparently. In this approach we abstract the common concepts of task based parallel programming and provide them to the programmer in a single programming interface uniformly for all frameworks. We have tested the interface by running programs which implement matrix operations within frameworks that are optimized for shared and distributed memory architectures and accelerators, while the cooperation between frameworks is configured externally with no need to modify the programs. Further possible extensions of the interface and future potential research are also described.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.